Project

General

Profile

Actions

Bug #63821

closed

crimson: failure in recover_missings due to rollback target not existing

Added by Samuel Just 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

sha1: 74e3a7f270908252df933e202602db01b6f3a5f4 -- empty commit on top of ceph/main sha1 9b597a74ca07d1e814b5f3e0354a50018fba3534
https://pulpito.ceph.com/sjust-2023-12-13_09:24:34-crimson-rados-wip-sjust-crimson-scrub-base-distro-default-smithi/7489471/

Did not see any instances with PR 53306 reverted. ci/wip-sjust-crimson-scrub-base-revert-53306, 76da90004609fa440775607a940237786917862f
Above is based on main sha1 9b597a74ca07d1e814b5f3e0354a50018fba3534 with (recent) PR 53306 reverted.
https://pulpito.ceph.com/sjust-2023-12-13_09:24:34-crimson-rados-wip-sjust-crimson-scrub-base-distro-default-smithi/7489467/

DEBUG 2023-12-13 09:57:06,143 [shard 0] osd - PGShardManager::broadcast_map_to_pgs broadcasted up to 74
DEBUG 2023-12-13 09:57:06,143 [shard 0] osd - client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3.1de2e061 (undecoded) ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]): got map 74, entering get_pg_mapping
DEBUG 2023-12-13 09:57:06,143 [shard 0] osd - client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3.1de2e061 (undecoded) ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]): can_create=false, target-core=0
DEBUG 2023-12-13 09:57:06,143 [shard 0] osd - client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3.1de2e061 (undecoded) ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]): entering create_or_wait_pg
INFO 2023-12-13 09:57:06,143 [shard 0] osd - osd.0: now active
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3.1de2e061 (undecoded) ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]): have_pg
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - 0x0 ClientRequest::with_pg_int: client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3.1de2e061 (undecoded) ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]) same_interval_since: 14
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - 0x603001679030 ClientRequest::with_pg_int: client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3:860747b8:::rb.0.105c.4db92e3b.000000000000:16 {read 0~80} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]) start
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - 0x603001679030 ClientRequest::with_pg_int: client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3:860747b8:::rb.0.105c.4db92e3b.000000000000:16 {read 0~80} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]).0: after await_map stage
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - 0x603001679030 ClientRequest::with_pg_int: client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3:860747b8:::rb.0.105c.4db92e3b.000000000000:16 {read 0~80} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]).0: after wait_for_map
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - 0x603001679030 ClientRequest::with_pg_int: client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3:860747b8:::rb.0.105c.4db92e3b.000000000000:16 {read 0~80} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]).0: after wait_for_active stage
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - 0x603001679030 ClientRequest::with_pg_int: client_request(id=34361, detail=m=[osd_op(client.4188.0:35314 3.1 3:860747b8:::rb.0.105c.4db92e3b.000000000000:16 {read 0~80} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e74) v8]).0: after wait_for_active
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - do_recover_missing check for recovery, 3:860747b8:::rb.0.105c.4db92e3b.000000000000:head
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - pg_epoch 74 pg[3.1( v 73'22 (0'0,73'22] local-lis/les=14/15 n=6 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [0,2] r=0 lpr=14 luod=73'23 lua=0'0 crt=73'23 mlcod 73'22 active+clean ObjectContextLoader::with_head_obc: object 3:860747b8:::rb.0.105c.4db92e3b.000000000000:head
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - pg_epoch 74 pg[3.1( v 73'22 (0'0,73'22] local-lis/les=14/15 n=6 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [0,2] r=0 lpr=14 luod=73'23 lua=0'0 crt=73'23 mlcod 73'22 active+clean ObjectContextLoader::get_or_load_obc: cache hit on 3:860747b8:::rb.0.105c.4db92e3b.000000000000:head
DEBUG 2023-12-13 09:57:06,144 [shard 0] osd - resolve_oid oid.snap=22,head snapset.seq=22
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-92-g74e3a7f2/rpm/el9/BUILD/ceph-19.0.0-92-g74e3a7f2/src/crimson/osd/osd_operations/client_request_common.cc:42: crimson::osd::CommonClientRequest::recover_missings(Ref<crimson::osd::PG>&, const hobject_t&, std::vector<snapid_t>&&)::<lambda(auto:179&)> mu
table::<lambda()> mutable::<lambda(auto:180, auto:181)> mutable [with auto:180 = boost::intrusive_ptr<crimson::osd::ObjectContext>; auto:181 = boost::intrusive_ptr<crimson::osd::ObjectContext>]: Assertion `oid' failed.
Aborting on shard 0.
Backtrace:

Culprit is that rollback targets may actually not exist -- the rollback in such a case results in the object being removed. The assert can be removed and the oid in question skipped.

Actions #1

Updated by Matan Breizman 4 months ago

  • Project changed from Ceph to crimson
Actions #2

Updated by Xuehan Xu 4 months ago

Seems that this is because this PR hasn't been merged: https://github.com/ceph/ceph/pull/54609

Actions #3

Updated by Matan Breizman 3 months ago

  • Status changed from New to Fix Under Review

Xuehan Xu wrote:

Seems that this is because this PR hasn't been merged: https://github.com/ceph/ceph/pull/54609

Merged. Can I close this tracker?

Actions #4

Updated by Matan Breizman 3 months ago

  • Status changed from Fix Under Review to Resolved
  • Pull request ID set to 54931
Actions

Also available in: Atom PDF