Bug #47212
closed
out-of-order "Error: finished tid 3 when last_acked_tid was 5"
Added by Kefu Chai over 3 years ago.
Updated over 1 year ago.
Description
2020-08-31T08:46:39.826 INFO:tasks.rados.rados.0.smithi116.stdout:191: writing smithi11620526-191 from 793330 to 1483112 tid 1
2020-08-31T08:46:39.834 INFO:tasks.rados.rados.0.smithi116.stdout:191: writing smithi11620526-191 from 2035461 to 2653192 tid 2
2020-08-31T08:46:39.835 INFO:tasks.rados.rados.0.smithi116.stdout:191: writing smithi11620526-191 from 3101545 to 3193330 tid 3
...
2020-08-31T08:46:40.860 INFO:tasks.rados.rados.0.smithi116.stdout:191: finishing write tid 1 to smithi11620526-191
2020-08-31T08:46:40.920 INFO:tasks.rados.rados.0.smithi116.stdout:191: finishing write tid 2 to smithi11620526-191
2020-08-31T08:46:41.480 INFO:tasks.rados.rados.0.smithi116.stdout:191: finishing write tid 3 to smithi11620526-191
2020-08-31T08:46:41.480 INFO:tasks.rados.rados.0.smithi116.stderr:Error: finished tid 3 when last_acked_tid was 5
2020-08-31T08:46:41.481 INFO:tasks.rados.rados.0.smithi116.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/
gigantic/release/16.0.0-4903-g268a7471a80/rpm/el8/BUILD/ceph-16.0.0-4903-g268a7471a80/src/test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f4cd37fe700 time
2020-08-31T08:46:41.474872+0000
2020-08-31T08:46:41.481 INFO:tasks.rados.rados.0.smithi116.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/
gigantic/release/16.0.0-4903-g268a7471a80/rpm/el8/BUILD/ceph-16.0.0-4903-g268a7471a80/src/test/osd/RadosModel.h: 902: ceph_abort_msg("abort() called")
2020-08-31T08:46:41.481 INFO:tasks.rados.rados.0.smithi116.stderr: ceph version 16.0.0-4903-g268a7471a80 (268a7471a8046235899274beb3fc7ebdde82e731) pacific (dev)
2020-08-31T08:46:41.481 INFO:tasks.rados.rados.0.smithi116.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > con
st&)+0xe0) [0x7f4ce700873c]
2020-08-31T08:46:41.482 INFO:tasks.rados.rados.0.smithi116.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x6e2) [0x55be6ef68ef2]
2020-08-31T08:46:41.482 INFO:tasks.rados.rados.0.smithi116.stderr: 3: (write_callback(void*, void*)+0x1e) [0x55be6ef88b8e]
/a//kchai-2020-08-31_08:18:31-crimson-rados:thrash-wip-kefu-testing-2020-08-31-1443-distro-basic-smithi/5394778/
- Description updated (diff)
$ zgrep smithi11620526-191 remote/*/log/ceph-osd* | grep do_osd_ops|grep reply
remote/smithi186/log/ceph-osd.2.log.gz:DEBUG 2020-08-31 08:46:39,896 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:835 3.e 3:7d6f3947:::smithi11620526-191:head {write 793330~689782 in=689782b, stat out=16b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
remote/smithi186/log/ceph-osd.2.log.gz:DEBUG 2020-08-31 08:46:39,938 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:836 3.e 3:7d6f3947:::smithi11620526-191:head {write 2035461~617731 in=617731b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
remote/smithi186/log/ceph-osd.2.log.gz:DEBUG 2020-08-31 08:46:39,978 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:839 3.e 3:7d6f3947:::smithi11620526-191:head {read 0~1 out=1b} snapc 0={} ondisk+read+rwordered+known_if_redirected e16) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
remote/smithi186/log/ceph-osd.3.log.gz:DEBUG 2020-08-31 08:46:41,033 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:837 3.e 3:7d6f3947:::smithi11620526-191:head {write 3101545~91785 in=91785b} snapc 0={} RETRY=1 ondisk+retry+write+known_if_redirected e18) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
remote/smithi186/log/ceph-osd.3.log.gz:DEBUG 2020-08-31 08:46:41,054 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:838 3.e 3:7d6f3947:::smithi11620526-191:head {setxattr (57), truncate 3193330} snapc 0={} RETRY=1 ondisk+retry+write+known_if_redirected e18) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
i think we should drop the read op after acquiring the obc lock if osd is not able to serve this request either because
- it's not the primary or
- it's not a replica and request is not a balanced read.
- or recovery_state indicates that the obc of the object is stale according to the pglog.
- Status changed from New to Fix Under Review
- Assignee set to Kefu Chai
- Pull request ID set to 36919
- Status changed from Fix Under Review to Resolved
- Status changed from Resolved to Pending Backport
- Backport set to pacific, octopus
2021-04-03T19:25:59.853 INFO:tasks.rados.rados.0.smithi046.stdout:603: finishing write tid 1 to smithi04631543-27
2021-04-03T19:25:59.854 INFO:tasks.rados.rados.0.smithi046.stderr:Error: finished tid 1 when last_acked_tid was 5
/ceph/teuthology-archive/yuriw-2021-04-03_17:21:06-rados-wip-yuri2-testing-2021-04-01-1440-octopus-distro-basic-smithi/6018206/teuthology.log
- Backport deleted (
pacific, octopus)
Removing the pacific/octopus backport request because there is no backport tracker for crimson at the moment. Please revert my change if I'm mistaken and there should be backports of this particular issue.
- Tags set to backport_processed
- Status changed from Pending Backport to Resolved
No backports for crimson for now.
Also available in: Atom
PDF