Project

General

Profile

Bug #47212

out-of-order "Error: finished tid 3 when last_acked_tid was 5"

Added by Kefu Chai 11 months ago. Updated 3 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-08-31T08:46:39.826 INFO:tasks.rados.rados.0.smithi116.stdout:191:  writing smithi11620526-191 from 793330 to 1483112 tid 1
2020-08-31T08:46:39.834 INFO:tasks.rados.rados.0.smithi116.stdout:191:  writing smithi11620526-191 from 2035461 to 2653192 tid 2
2020-08-31T08:46:39.835 INFO:tasks.rados.rados.0.smithi116.stdout:191:  writing smithi11620526-191 from 3101545 to 3193330 tid 3
...
2020-08-31T08:46:40.860 INFO:tasks.rados.rados.0.smithi116.stdout:191:  finishing write tid 1 to smithi11620526-191
2020-08-31T08:46:40.920 INFO:tasks.rados.rados.0.smithi116.stdout:191:  finishing write tid 2 to smithi11620526-191
2020-08-31T08:46:41.480 INFO:tasks.rados.rados.0.smithi116.stdout:191:  finishing write tid 3 to smithi11620526-191
2020-08-31T08:46:41.480 INFO:tasks.rados.rados.0.smithi116.stderr:Error: finished tid 3 when last_acked_tid was 5
2020-08-31T08:46:41.481 INFO:tasks.rados.rados.0.smithi116.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/
gigantic/release/16.0.0-4903-g268a7471a80/rpm/el8/BUILD/ceph-16.0.0-4903-g268a7471a80/src/test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f4cd37fe700 time
 2020-08-31T08:46:41.474872+0000
2020-08-31T08:46:41.481 INFO:tasks.rados.rados.0.smithi116.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/
gigantic/release/16.0.0-4903-g268a7471a80/rpm/el8/BUILD/ceph-16.0.0-4903-g268a7471a80/src/test/osd/RadosModel.h: 902: ceph_abort_msg("abort() called")
2020-08-31T08:46:41.481 INFO:tasks.rados.rados.0.smithi116.stderr: ceph version 16.0.0-4903-g268a7471a80 (268a7471a8046235899274beb3fc7ebdde82e731) pacific (dev)
2020-08-31T08:46:41.481 INFO:tasks.rados.rados.0.smithi116.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > con
st&)+0xe0) [0x7f4ce700873c]
2020-08-31T08:46:41.482 INFO:tasks.rados.rados.0.smithi116.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x6e2) [0x55be6ef68ef2]
2020-08-31T08:46:41.482 INFO:tasks.rados.rados.0.smithi116.stderr: 3: (write_callback(void*, void*)+0x1e) [0x55be6ef88b8e]

/a//kchai-2020-08-31_08:18:31-crimson-rados:thrash-wip-kefu-testing-2020-08-31-1443-distro-basic-smithi/5394778/

History

#1 Updated by Kefu Chai 11 months ago

  • Description updated (diff)

#2 Updated by Kefu Chai 11 months ago

$ zgrep smithi11620526-191 remote/*/log/ceph-osd* | grep do_osd_ops|grep reply
remote/smithi186/log/ceph-osd.2.log.gz:DEBUG 2020-08-31 08:46:39,896 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:835 3.e 3:7d6f3947:::smithi11620526-191:head {write 793330~689782 in=689782b, stat out=16b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
remote/smithi186/log/ceph-osd.2.log.gz:DEBUG 2020-08-31 08:46:39,938 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:836 3.e 3:7d6f3947:::smithi11620526-191:head {write 2035461~617731 in=617731b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
remote/smithi186/log/ceph-osd.2.log.gz:DEBUG 2020-08-31 08:46:39,978 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:839 3.e 3:7d6f3947:::smithi11620526-191:head {read 0~1 out=1b} snapc 0={} ondisk+read+rwordered+known_if_redirected e16) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
remote/smithi186/log/ceph-osd.3.log.gz:DEBUG 2020-08-31 08:46:41,033 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:837 3.e 3:7d6f3947:::smithi11620526-191:head {write 3101545~91785 in=91785b} snapc 0={} RETRY=1 ondisk+retry+write+known_if_redirected e18) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply
remote/smithi186/log/ceph-osd.3.log.gz:DEBUG 2020-08-31 08:46:41,054 [shard 0] osd - do_osd_ops: osd_op(client.4228.0:838 3.e 3:7d6f3947:::smithi11620526-191:head {setxattr (57), truncate 3193330} snapc 0={} RETRY=1 ondisk+retry+write+known_if_redirected e18) v8 - object 3:7d6f3947:::smithi11620526-191:head sending reply

#3 Updated by Kefu Chai 11 months ago

i think we should drop the read op after acquiring the obc lock if osd is not able to serve this request either because

  • it's not the primary or
  • it's not a replica and request is not a balanced read.
  • or recovery_state indicates that the obc of the object is stale according to the pglog.

#4 Updated by Kefu Chai 11 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Kefu Chai
  • Pull request ID set to 36919

#5 Updated by Kefu Chai 11 months ago

  • Status changed from Fix Under Review to Resolved

#6 Updated by Deepika Upadhyay 4 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to pacific, octopus
2021-04-03T19:25:59.853 INFO:tasks.rados.rados.0.smithi046.stdout:603:  finishing write tid 1 to smithi04631543-27
2021-04-03T19:25:59.854 INFO:tasks.rados.rados.0.smithi046.stderr:Error: finished tid 1 when last_acked_tid was 5

/ceph/teuthology-archive/yuriw-2021-04-03_17:21:06-rados-wip-yuri2-testing-2021-04-01-1440-octopus-distro-basic-smithi/6018206/teuthology.log

#7 Updated by Loïc Dachary 3 months ago

  • Backport deleted (pacific, octopus)

Removing the pacific/octopus backport request because there is no backport tracker for crimson at the moment. Please revert my change if I'm mistaken and there should be backports of this particular issue.

Also available in: Atom PDF