Bug #13013
closedCache tier thrashing causes out of order write
0%
Description
dzafman-2015-09-02_21:09:42-rados-wip-zafman-testing---basic-multi/1043109
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 5 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 6 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2065: finishing write tid 1 to plana9020432-2065
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 3 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stderr:Error: finished tid 3 when last_acked_tid was 6
2015-09-03T19:42:21.070 INFO:tasks.rados.rados.0.plana90.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7ffa197fa700 time 2015-09-03 19:42:21.049725
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr:./test/osd/RadosModel.h: 862: FAILED assert(0)
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr: ceph version 9.0.3-1368-g50bd011 (50bd0110367cae8e305ee7b8377f086a5cde91c1)
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x564eb1624d9b]
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x4b8) [0x564eb1610668]
2015-09-03T19:42:21.071 INFO:tasks.rados.rados.0.plana90.stderr: 3: (write_callback(void*, void*)+0x19) [0x564eb161c0e9]
2015-09-03T19:42:21.072 INFO:tasks.rados.rados.0.plana90.stderr: 4: (()+0xba9bd) [0x7ffa2ee1a9bd]
2015-09-03T19:42:21.072 INFO:tasks.rados.rados.0.plana90.stderr: 5: (()+0x962b9) [0x7ffa2edf62b9]
2015-09-03T19:42:21.072 INFO:tasks.rados.rados.0.plana90.stderr: 6: (()+0x158608) [0x7ffa2eeb8608]
2015-09-03T19:42:21.072 INFO:tasks.rados.rados.0.plana90.stderr: 7: (()+0x8182) [0x7ffa2e934182]
Updated by David Zafman over 8 years ago
- Subject changed from Error: finished tid 3 when last_acked_tid was 6 to Cache tier thrashing causes out of order write
description: rados/singleton/{all/thrash_cache_writeback_forward_none.yaml fs/btrfs.yaml
msgr/async.yaml msgr-failures/many.yaml}
2015-09-03T19:42:20.052 INFO:tasks.rados.rados.0.plana90.stdout:2064: writing plana9020432-2064 from 707015 to 1349135 tid 1
2015-09-03T19:42:20.090 INFO:tasks.rados.rados.0.plana90.stdout:2064: writing plana9020432-2064 from 1838472 to 2535222 tid 2
2015-09-03T19:42:20.097 INFO:tasks.rados.rados.0.plana90.stdout:2064: writing plana9020432-2064 from 3105449 to 3107015 tid 3
2015-09-03T19:42:20.794 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 1 to plana9020432-2064
2015-09-03T19:42:20.795 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 2 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 5 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 6 to plana9020432-2064
2015-09-03T19:42:21.050 INFO:tasks.rados.rados.0.plana90.stdout:2064: finishing write tid 3 to plana9020432-2064
All I/O is ending up going to the base tier pool = 1
Write sent to base tier pool 1 pg 1.3 [4.1]
2015-09-03 19:42:20.055490 7fcbee6b9700 10 osd.4 16 dequeue_op 0x561c822e8500 prio 63 cost 642120 latency 0.000627 osd_op(client.4118.0:18549 plana9020432-2064 [write 707015~642120,stat] 1.4598b67f RETRY=1 ack+ondisk+retry+write+redirected+known_if_redirected e16) v6 pg pg[1.3( v 16'1937 (0'0,16'1937] local-les=6 n=529 ec=5 les/c 6/6 5/5/5) [4,1] r=0 lpr=5 crt=16'1933 lcod 16'1936 mlcod 16'1936 active+clean]
This must be related to cache tier thrashing causing retries, but more log analysis is needed.
Updated by Loïc Dachary over 8 years ago
On hammer http://149.202.184.126:8081/ubuntu-2015-11-02_11:32:06-rados-hammer-backports---basic-openstack/75
description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml clusters/{fixed-2.yaml openstack.yaml} fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/cache-agent-big.yaml} ... 2015-11-02T17:03:23.114 INFO:tasks.rados.rados.0.target187101.stdout:5814: finishing write tid 1 to target187101.teuthology10750-5814 2015-11-02T17:03:23.114 INFO:tasks.rados.rados.0.target187101.stderr:Error: finished tid 1 when last_acked_tid was 5 2015-11-02T17:03:23.171 INFO:tasks.rados.rados.0.target187101.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7fadbf7fe700 time 2015-11-02 17:03:23.028048 2015-11-02T17:03:23.171 INFO:tasks.rados.rados.0.target187101.stderr:./test/osd/RadosModel.h: 854: FAILED assert(0) 2015-11-02T17:03:23.172 INFO:tasks.rados.rados.0.target187101.stderr: ceph version 0.94.5-149-g060124a (060124a138e44389dd41992c3c5eb473c8fa6f16) 2015-11-02T17:03:23.172 INFO:tasks.rados.rados.0.target187101.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x4e05db] 2015-11-02T17:03:23.173 INFO:tasks.rados.rados.0.target187101.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x46e) [0x4c83ae] 2015-11-02T17:03:23.173 INFO:tasks.rados.rados.0.target187101.stderr: 3: (write_callback(void*, void*)+0x19) [0x4d7bf9] 2015-11-02T17:03:23.173 INFO:tasks.rados.rados.0.target187101.stderr: 4: (()+0xb4cdd) [0x7fadd9b1acdd]
Updated by Sage Weil over 8 years ago
- Status changed from New to Duplicate
Updated by Sage Weil over 8 years ago
- Is duplicate of Bug #12814: flipping the overlay from forward to seems to reorder writes added