Project

General

Profile

Actions

Bug #13013

closed

Cache tier thrashing causes out of order write

Added by David Zafman over 8 years ago. Updated over 8 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

dzafman-2015-09-02_21:09:42-rados-wip-zafman-testing---basic-multi/1043109

2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 5 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 6 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2065: finishing write tid 1 to plana9020432-2065
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 3 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stderr:Error: finished tid 3 when last_acked_tid was 6
2015-09-03T19:42:21.070 INFO:tasks.rados.rados.0.plana90.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7ffa197fa700 time 2015-09-03 19:42:21.049725
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr:./test/osd/RadosModel.h: 862: FAILED assert(0)
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr: ceph version 9.0.3-1368-g50bd011 (50bd0110367cae8e305ee7b8377f086a5cde91c1)
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x564eb1624d9b]
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x4b8) [0x564eb1610668]
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr: 3: (write_callback(void*, void*)+0x19) [0x564eb161c0e9]
2015-09-03T19:42:21.072 INFO:tasks.rados.rados.0.plana90.stderr: 4: (()+0xba9bd) [0x7ffa2ee1a9bd]
2015-09-03T19:42:21.072 INFO:tasks.rados.rados.0.plana90.stderr: 5: (()+0x962b9) [0x7ffa2edf62b9]
2015-09-03T19:42:21.072 INFO:tasks.rados.rados.0.plana90.stderr: 6: (()+0x158608) [0x7ffa2eeb8608]
2015-09-03T19:42:21.072 INFO:tasks.rados.rados.0.plana90.stderr: 7: (()+0x8182) [0x7ffa2e934182]


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #12814: flipping the overlay from forward to seems to reorder writesCan't reproduceSage Weil08/27/2015

Actions
Actions #1

Updated by David Zafman over 8 years ago

  • Subject changed from Error: finished tid 3 when last_acked_tid was 6 to Cache tier thrashing causes out of order write

description: rados/singleton/{all/thrash_cache_writeback_forward_none.yaml fs/btrfs.yaml
msgr/async.yaml msgr-failures/many.yaml}

2015-09-03T19:42:20.052 INFO:tasks.rados.rados.0.plana90.stdout:2064: writing plana9020432-2064 from 707015 to 1349135 tid 1
2015-09-03T19:42:20.090 INFO:tasks.rados.rados.0.plana90.stdout:2064: writing plana9020432-2064 from 1838472 to 2535222 tid 2
2015-09-03T19:42:20.097 INFO:tasks.rados.rados.0.plana90.stdout:2064: writing plana9020432-2064 from 3105449 to 3107015 tid 3
2015-09-03T19:42:20.794 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 1 to plana9020432-2064
2015-09-03T19:42:20.795 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 2 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 5 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 6 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 3 to plana9020432-2064

All I/O is ending up going to the base tier pool = 1

Write sent to base tier pool 1 pg 1.3 [4.1]
2015-09-03 19:42:20.055490 7fcbee6b9700 10 osd.4 16 dequeue_op 0x561c822e8500 prio 63 cost 642120 latency 0.000627 osd_op(client.4118.0:18549 plana9020432-2064 [write 707015~642120,stat] 1.4598b67f RETRY=1 ack+ondisk+retry+write+redirected+known_if_redirected e16) v6 pg pg[1.3( v 16'1937 (0'0,16'1937] local-les=6 n=529 ec=5 les/c 6/6 5/5/5) [4,1] r=0 lpr=5 crt=16'1933 lcod 16'1936 mlcod 16'1936 active+clean]

This must be related to cache tier thrashing causing retries, but more log analysis is needed.

Actions #2

Updated by Loïc Dachary over 8 years ago

On hammer http://149.202.184.126:8081/ubuntu-2015-11-02_11:32:06-rados-hammer-backports---basic-openstack/75

  description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml
    clusters/{fixed-2.yaml openstack.yaml} fs/xfs.yaml msgr-failures/osd-delay.yaml
    thrashers/morepggrow.yaml workloads/cache-agent-big.yaml}
...
2015-11-02T17:03:23.114 INFO:tasks.rados.rados.0.target187101.stdout:5814:  finishing write tid 1 to target187101.teuthology10750-5814
2015-11-02T17:03:23.114 INFO:tasks.rados.rados.0.target187101.stderr:Error: finished tid 1 when last_acked_tid was 5
2015-11-02T17:03:23.171 INFO:tasks.rados.rados.0.target187101.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7fadbf7fe700 time 2015-11-02 17:03:23.028048
2015-11-02T17:03:23.171 INFO:tasks.rados.rados.0.target187101.stderr:./test/osd/RadosModel.h: 854: FAILED assert(0)
2015-11-02T17:03:23.172 INFO:tasks.rados.rados.0.target187101.stderr: ceph version 0.94.5-149-g060124a (060124a138e44389dd41992c3c5eb473c8fa6f16)
2015-11-02T17:03:23.172 INFO:tasks.rados.rados.0.target187101.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x4e05db]
2015-11-02T17:03:23.173 INFO:tasks.rados.rados.0.target187101.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x46e) [0x4c83ae]
2015-11-02T17:03:23.173 INFO:tasks.rados.rados.0.target187101.stderr: 3: (write_callback(void*, void*)+0x19) [0x4d7bf9]
2015-11-02T17:03:23.173 INFO:tasks.rados.rados.0.target187101.stderr: 4: (()+0xb4cdd) [0x7fadd9b1acdd]
Actions #3

Updated by Sage Weil over 8 years ago

  • Status changed from New to Duplicate
Actions #4

Updated by Sage Weil over 8 years ago

  • Is duplicate of Bug #12814: flipping the overlay from forward to seems to reorder writes added
Actions

Also available in: Atom PDF