Project

General

Profile

Bug #38840

snaps missing in mapper, should be: ca was r -2...repaired

Added by David Zafman about 5 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, mimic, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

dzafman-2019-03-20_19:53:02-rados-wip-zafman-testing-distro-basic-smithi/3754443

This looks like a cache tier eviction deleting a clone while a simultaneous scrub is happening.

Messages leading to the error with "smithi08716243-128:d8"

2019-03-21 09:59:25.959 7f91c26d6700 20 bluestore(/var/lib/ceph/osd/ceph-4) _collection_list oid #3:bbcdb3d6:::smithi08716243-128:d8# end GHMAX
2019-03-21 09:59:25.959 7f91c26d6700 20 bluestore(/var/lib/ceph/osd/ceph-4) _collection_list oid #3:bbcdb3d6:::smithi08716243-128:d8# end #3:bbcdb3d6:::smithi08716243-128:fb#
2019-03-21 09:59:25.959 7f91c26d6700 10 osd.4 pg_epoch: 375 pg[3.1( v 375'2366 (0'0,375'2366] local-lis/les=360/361 n=53 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 crt=375'2366 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] be_scan_list (24/25 3:bbcdb3d6:::smithi08716243-128:d8)
2019-03-21 09:59:25.959 7f91c26d6700 10 bluestore(/var/lib/ceph/osd/ceph-4) stat 3.1_head #3:bbcdb3d6:::smithi08716243-128:d8#
2019-03-21 09:59:25.959 7f91c26d6700 15 bluestore(/var/lib/ceph/osd/ceph-4) getattrs 3.1_head #3:bbcdb3d6:::smithi08716243-128:d8#
2019-03-21 09:59:25.959 7f91c26d6700 10 bluestore(/var/lib/ceph/osd/ceph-4) getattrs 3.1_head #3:bbcdb3d6:::smithi08716243-128:d8# = 0
2019-03-21 09:59:25.959 7f91c26d6700 25 osd.4 pg_epoch: 375 pg[3.1( v 375'2366 (0'0,375'2366] local-lis/les=360/361 n=53 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 crt=375'2366 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] be_scan_list  3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.967 7f91be6ce700 10 osd.4 pg_epoch: 375 pg[3.1( v 375'2366 (0'0,375'2366] local-lis/les=360/361 n=53 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 crt=375'2366 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] be_select_auth_object: selecting osd 4 for obj 3:bbcdb3d6:::smithi08716243-128:d8 with oi 3:bbcdb3d6:::smithi08716243-128:d8(312'2265 client.4453.0:4316 dirty s 2622057 uv 1627 alloc_hint [0 0 0])
2019-03-21 09:59:25.971 7f91c4edb700 10 osd.4 pg_epoch: 375 pg[3.1( v 375'2367 (0'0,375'2367] local-lis/les=360/361 n=52 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] finish_flush 3:bbcdb3d6:::smithi08716243-128:d8 tid 1113 (0) Success
2019-03-21 09:59:25.971 7f91c36d8700 20 osd.4 375 agent_entry oids 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.971 7f91c4edb700 10 osd.4 pg_epoch: 375 pg[3.1( v 375'2367 (0'0,375'2367] local-lis/les=360/361 n=52 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] agent_maybe_evict evicting 3:bbcdb3d6:::smithi08716243-128:d8(312'2265 client.4453.0:4316 dirty s 2622057 uv 1627 alloc_hint [0 0 0])
2019-03-21 09:59:25.971 7f91c4edb700 20 osd.4 pg_epoch: 375 pg[3.1( v 375'2367 (0'0,375'2367] local-lis/les=360/361 n=52 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] simple_opc_create 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.971 7f91c4edb700 20 osd.4 pg_epoch: 375 pg[3.1( v 375'2367 (0'0,375'2367] local-lis/les=360/361 n=52 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] _delete_oid 3:bbcdb3d6:::smithi08716243-128:d8 whiteout=0 no_whiteout=1 try_no_whiteout=0
2019-03-21 09:59:25.971 7f91c4edb700 20 osd.4 pg_epoch: 375 pg[3.1( v 375'2367 (0'0,375'2367] local-lis/les=360/361 n=52 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] finish_ctx 3:bbcdb3d6:::smithi08716243-128:d8 0xfb79600 op delete
2019-03-21 09:59:25.971 7f91c4edb700  7 osd.4 pg_epoch: 375 pg[3.1( v 375'2367 (0'0,375'2367] local-lis/les=360/361 n=52 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] issue_repop rep_tid 2009 o 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.971 7f91c4edb700 20 osd.4 pg_epoch: 375 pg[3.1( v 375'2367 (0'0,375'2367] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] apply_stats 3:bbcdb3d6:::smithi08716243-128:d8 < [3:bbcdb3d6:::smithi08716243-128:fb,MAX)

2019-03-21 09:59:25.971 7f91c4edb700  1 -- [v2:172.21.15.205:6818/13410,v1:172.21.15.205:6819/13410] --> [v2:172.21.15.87:6804/13348,v1:172.21.15.87:6805/13348] -- osd_repop(osd.4.0:2009 3.1 e375/360 3:bbcdb3d6:::smithi08716243-128:d8 v 375'2368) v2 -- 0x11c85000 con 0xf1fe480
2019-03-21 09:59:25.971 7f91c4edb700 20 snap_mapper.remove_oid 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.971 7f91c4edb700 20 snap_mapper._remove_oid 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.971 7f91c4edb700 20 snap_mapper.get_snaps 3:bbcdb3d6:::smithi08716243-128:d8 ca
2019-03-21 09:59:25.971 7f91c4edb700 20 snap_mapper.clear_snaps 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.971 7f91c4edb700 10 osd.4 pg_epoch: 375 pg[3.1( v 375'2367 (0'0,375'2367] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] append_log log((0'0,375'2367], crt=375'2367) [375'2368 (312'2265) delete   3:bbcdb3d6:::smithi08716243-128:d8 by osd.4.0:2009 2019-03-21 09:59:25.975993 0]
2019-03-21 09:59:25.971 7f91c4edb700 10 osd.4 pg_epoch: 375 pg[3.1( v 375'2368 (0'0,375'2368] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 lua=375'2367 crt=375'2367 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] add_log_entry 375'2368 (312'2265) delete   3:bbcdb3d6:::smithi08716243-128:d8 by osd.4.0:2009 2019-03-21 09:59:25.975993 0
2019-03-21 09:59:25.971 7f91c4edb700 20 trim 375'2368 (312'2265) delete   3:bbcdb3d6:::smithi08716243-128:d8 by osd.4.0:2009 2019-03-21 09:59:25.975993 0
2019-03-21 09:59:25.971 7f91c4edb700 20 osd.4 pg_epoch: 375 pg[3.1( v 375'2368 (0'0,375'2368] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 lua=375'2367 crt=375'2368 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] rollforward: entry=375'2368 (312'2265) delete   3:bbcdb3d6:::smithi08716243-128:d8 by osd.4.0:2009 2019-03-21 09:59:25.975993 0
2019-03-21 09:59:25.971 7f91c4edb700 15 bluestore(/var/lib/ceph/osd/ceph-4) _remove 3.1_head #3:bbcdb3d6:::smithi08716243-128:d8# onode 0x11960000 txc 0xc9c2300
2019-03-21 09:59:25.971 7f91c4edb700 15 bluestore(/var/lib/ceph/osd/ceph-4) _do_truncate 3.1_head #3:bbcdb3d6:::smithi08716243-128:d8# 0x0
2019-03-21 09:59:25.971 7f91c4edb700 10 bluestore(/var/lib/ceph/osd/ceph-4) _remove 3.1_head #3:bbcdb3d6:::smithi08716243-128:d8# = 0
2019-03-21 09:59:25.971 7f91be6ce700 20 osd.4 pg_epoch: 375 pg[3.1( v 375'2368 (0'0,375'2368] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2368 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] scrub  3:bbcdb3d6:::smithi08716243-128:d8 3:bbcdb3d6:::smithi08716243-128:d8(312'2265 client.4453.0:4316 dirty s 2622057 uv 1627 alloc_hint [0 0 0])
2019-03-21 09:59:25.971 7f91be6ce700 20 osd.4 pg_epoch: 375 pg[3.1( v 375'2368 (0'0,375'2368] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2368 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] scrub_snapshot_metadata scrub matched clone 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.971 7f91be6ce700 20 osd.4 pg_epoch: 375 pg[3.1( v 375'2368 (0'0,375'2368] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=375'2366 crt=375'2368 lcod 375'2365 mlcod 375'2365 active+clean+scrubbing trimq=[107~1,10a~3,10e~2] ps=[da~1,e7~1,ea~1,f1~1,f9~1,fe~1]] _scan_snaps 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 09:59:25.971 7f91be6ce700 20 snap_mapper.get_snaps 3:bbcdb3d6:::smithi08716243-128:d8 got.empty()
2019-03-21 09:59:25.971 7f91be6ce700 -1 log_channel(cluster) log [ERR] : osd.4 found snap mapper error on pg 3.1 oid 3:bbcdb3d6:::smithi08716243-128:d8 snaps missing in mapper, should be: ca was  r -2...repaired

Later the object info is missing and "Can not trim 3:bbcdb3d6:::smithi08716243-128:d8 repair needed" keeps happening.

2019-03-21 09:59:25.971 7f91be6ce700 20 snap_mapper.add_oid 3:bbcdb3d6:::smithi08716243-128:d8 ca
2019-03-21 09:59:25.971 7f91be6ce700 20 snap_mapper.get_snaps 3:bbcdb3d6:::smithi08716243-128:d8 got.empty()
2019-03-21 09:59:25.971 7f91be6ce700 20 snap_mapper.set_snaps 3:bbcdb3d6:::smithi08716243-128:d8 ca
2019-03-21 10:00:01.990 7f91c26d6700 10 osd.4 pg_epoch: 411 pg[3.1( v 411'2577 (0'0,411'2577] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=411'2575 crt=411'2577 lcod 411'2574 mlcod 411'2574 active+clean+snaptrim trimq=[ca~1,116~2,11c~1,11f~2,127~2,12f~1,131~1,136~1] ps=[10d~1]] SnapTrimmer state<Trimming/AwaitAsyncWork>: AwaitAsyncWork react trimming 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 10:00:01.990 7f91c26d6700 10 osd.4 pg_epoch: 411 pg[3.1( v 411'2577 (0'0,411'2577] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=411'2575 crt=411'2577 lcod 411'2574 mlcod 411'2574 active+clean+snaptrim trimq=[ca~1,116~2,11c~1,11f~2,127~2,12f~1,131~1,136~1] ps=[10d~1]] get_object_context: obc NOT found in cache: 3:bbcdb3d6:::smithi08716243-128:d8
2019-03-21 10:00:01.990 7f91c26d6700 15 bluestore(/var/lib/ceph/osd/ceph-4) getattr 3.1_head #3:bbcdb3d6:::smithi08716243-128:d8# _
2019-03-21 10:00:01.990 7f91c26d6700 10 bluestore(/var/lib/ceph/osd/ceph-4) getattr 3.1_head #3:bbcdb3d6:::smithi08716243-128:d8# _ = -2
2019-03-21 10:00:01.990 7f91c26d6700 10 osd.4 pg_epoch: 411 pg[3.1( v 411'2577 (0'0,411'2577] local-lis/les=360/361 n=51 ec=20/20 lis/c 360/360 les/c/f 361/361/0 360/360/360) [4,0] r=0 lpr=360 luod=411'2575 crt=411'2577 lcod 411'2574 mlcod 411'2574 active+clean+snaptrim trimq=[ca~1,116~2,11c~1,11f~2,127~2,12f~1,131~1,136~1] ps=[10d~1]] get_object_context: no obc for soid 3:bbcdb3d6:::smithi08716243-128:d8 and !can_create
2019-03-21 10:00:01.990 7f91c26d6700 -1 log_channel(cluster) log [ERR] : trim_object: Can not trim 3:bbcdb3d6:::smithi08716243-128:d8 repair needed (no obc)

Related issues

Related to RADOS - Bug #24045: Eviction still raced with scrub due to preemption Resolved 05/07/2018
Related to RADOS - Bug #40454: snap_mapper error, scrub gets r -2..repaired Can't reproduce
Copied to RADOS - Backport #39518: mimic: snaps missing in mapper, should be: ca was r -2...repaired Resolved
Copied to RADOS - Backport #39519: nautilus: snaps missing in mapper, should be: ca was r -2...repaired Resolved
Copied to RADOS - Backport #39520: luminous: snaps missing in mapper, should be: ca was r -2...repaired Rejected

History

#1 Updated by David Zafman about 5 years ago

  • Related to Bug #24045: Eviction still raced with scrub due to preemption added

#2 Updated by David Zafman about 5 years ago

  • Description updated (diff)

#3 Updated by David Zafman almost 5 years ago

  • Status changed from New to In Progress
  • Pull request ID set to 27209

#4 Updated by David Zafman almost 5 years ago

  • Status changed from In Progress to 7

#5 Updated by David Zafman almost 5 years ago

  • Status changed from 7 to Pending Backport
  • Backport set to luminous, mimic, nautilus

#6 Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #39518: mimic: snaps missing in mapper, should be: ca was r -2...repaired added

#7 Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #39519: nautilus: snaps missing in mapper, should be: ca was r -2...repaired added

#8 Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #39520: luminous: snaps missing in mapper, should be: ca was r -2...repaired added

#9 Updated by Sage Weil almost 5 years ago

  • Related to Bug #40454: snap_mapper error, scrub gets r -2..repaired added

#10 Updated by David Zafman about 4 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF