Bug #57698
closedosd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
0%
Description
The Primary registers its intent to scrub with the 'get_actingset()', as it should.
But the actual chunk requests are sent to 'get_acting_recovery_backfill()'.
Now - if there is (or was during this interval) a replica that required backfilling -
the two sets do not match.
Updated by Radoslaw Zarzynski over 1 year ago
- Status changed from New to In Progress
IIRC Ronen has mentioned the scrub code interchanges get_acting_set()
and get_acting_recovery_backfill()
.
Updated by Radoslaw Zarzynski over 1 year ago
What are symptoms? How bad is it? A hang maybe? I'm asking to understand the impact.
Updated by Ronen Friedman over 1 year ago
Following some discussions: here are excerpts from a run demonstrating this issue.
Test run rfriedma-2022-09-28_15:55:59-rados:thrash-wip-rf-tout-distro-default-smithi/7046786.
OSD 2 becomes the upset primary, but is then backfilled. OSD 0 is the acting primary, and initiates
a scrub.
osd 2: 2022-09-28T16:50:35.104+0000 7f806ec78700 10 osd.2 pg_epoch: 131 pg[3.15( v 128'285 (121'281,128'285] lb MIN local-lis/les=110/111 n=37 ec=52/20 lis/c=110/110 les/c/f=111/128/0 sis=129) [7,3,4] r=-1 lpr=129 DELETING pi=[110,129)/1 crt=128'285 lcod 128'283 mlcod 0'0 unknown NOTIFY mbc={} ps=[2~2,6~3,a~5,10~1,15~1,17~2,1b~1,1f~1,23~2]] handle_advance_map [7,3,4]/[7,3,4] -- 7/7 2022-09-28T16:50:35.104+0000 7f806ec78700 10 osd.2 pg_epoch: 132 pg[3.15( v 128'285 (121'281,128'285] lb MIN local-lis/les=110/111 n=37 ec=52/20 lis/c=110/110 les/c/f=111/128/0 sis=129) [7,3,4] r=-1 lpr=129 DELETING pi=[110,129)/1 crt=128'285 lcod 128'283 mlcod 0'0 unknown NOTIFY mbc={} ps=[2~2,6~3,a~5,10~1,15~1,17~2,1b~1,1f~1,23~2]] state<Started>: Started advmap 2022-09-28T16:50:35.104+0000 7f806ec78700 10 osd.2 pg_epoch: 132 pg[3.15( v 128'285 (121'281,128'285] lb MIN local-lis/les=110/111 n=37 ec=52/20 lis/c=110/110 les/c/f=111/128/0 sis=129) [7,3,4] r=-1 lpr=129 DELETING pi=[110,129)/1 crt=128'285 lcod 128'283 mlcod 0'0 unknown NOTIFY mbc={} ps=[2~2,6~3,a~5,10~1,15~1,17~2,1b~1,1f~1,23~2]] check_recovery_sources no source osds () went down -- after repeeringm with a nwe acting set: 2022-09-28T16:53:29.143+0000 7f806ac70700 1 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary 2022-09-28T16:53:29.143+0000 7f806ac70700 5 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 unknown mbc={}] exit Start 0.000009 0 0.000000 2022-09-28T16:53:29.143+0000 7f806ac70700 5 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 unknown mbc={}] enter Started/Primary 2022-09-28T16:53:29.143+0000 7f806ac70700 5 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 unknown mbc={}] enter Started/Primary/Peering 2022-09-28T16:53:29.143+0000 7f806ac70700 5 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] enter Started/Primary/Peering/GetInfo 2022-09-28T16:53:29.143+0000 7f806ac70700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] build_prior all_probe 0,3,7 2022-09-28T16:53:29.143+0000 7f806ac70700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] build_prior maybe_rw interval:248, acting: 0,3,7 2022-09-28T16:53:29.143+0000 7f806ac70700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] build_prior final: probe 0,2,3,7 down blocked_by {} 2022-09-28T16:53:29.143+0000 7f806ac70700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] up_thru 300 < same_since 306, must notify monitor -- 2022-09-28T16:53:29.143+0000 7f806ac70700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] scrubber<NotActive>: on_primary_change (from on_info_history_change (plnd:)): Primary. <in-qu>&<registered> --> <in-qu>&<registered> 2022-09-28T16:53:29.143+0000 7f806ac70700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] osd.7 has stray content: 3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=17 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306 pruub=15.002688408s) 2022-09-28T16:53:29.143+0000 7f806ac70700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] update_heartbeat_peers 0,2,3 -> 0,2,3,7 2022-09-28T16:53:29.143+0000 7f806ac70700 20 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetInfo>: Adding osd: 7 peer features: 3f01cfbf7ffdffff -- 2022-09-28T16:53:29.146+0000 7f806ec78700 5 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] enter Started/Primary/Peering/GetLog 2022-09-28T16:53:29.146+0000 7f806ec78700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] choose_acting all_info osd.0 3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=15 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306 pruub=15.000658989s) 2022-09-28T16:53:29.146+0000 7f806ec78700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] choose_acting all_info osd.2 3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) 2022-09-28T16:53:29.146+0000 7f806ec78700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] choose_acting all_info osd.3 3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=15 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306 pruub=15.001117706s) 2022-09-28T16:53:29.146+0000 7f806ec78700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] choose_acting all_info osd.7 3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=17 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306 pruub=15.002688408s) 2022-09-28T16:53:29.146+0000 7f806ec78700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] select_replicated_primary newest update on osd.0 with 3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=15 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306 pruub=15.000658989s) 2022-09-28T16:53:29.146+0000 7f806ec78700 20 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] choose_async_recovery_replicated candidates by cost are: 2022-09-28T16:53:29.146+0000 7f806ec78700 20 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] choose_async_recovery_replicated result want=[0,3,7] async_recovery= 2022-09-28T16:53:29.146+0000 7f806ec78700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] choose_acting want [0,3,7] != acting [2,3,0], requesting pg_temp change 2022-09-28T16:53:29.146+0000 7f806ec78700 5 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] exit Started/Primary/Peering/GetLog 0.000089 0 0.000000 2022-09-28T16:53:29.146+0000 7f806ec78700 10 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering>: Leaving Peering 2022-09-28T16:53:29.146+0000 7f806ec78700 5 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 peering mbc={}] exit Started/Primary/Peering 0.003240 0 0.000000 2022-09-28T16:53:29.146+0000 7f806ec78700 5 osd.2 pg_epoch: 306 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=306 pi=[248,306)/1 crt=0'0 mlcod 0'0 unknown mbc={}] enter Started/Primary/WaitActingChange -- 2022-09-28T16:53:30.142+0000 7f806ac70700 10 osd.2 pg_epoch: 307 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0]/[0,3,7] r=-1 lpr=307 pi=[248,307)/1 crt=0'0 mlcod 0'0 remapped mbc={}] noting past ([248,306] all_participants=0,2,3,7 intervals=([248,305] acting 0,3,7)) 2022-09-28T16:53:30.142+0000 7f806ac70700 20 osd.2 pg_epoch: 307 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=-1 lpr=307 pi=[248,307)/1 crt=0'0 mlcod 0'0 remapped mbc={}] on_new_interval 2022-09-28T16:53:30.142+0000 7f806ac70700 20 osd.2 pg_epoch: 307 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=-1 lpr=307 pi=[248,307)/1 crt=0'0 mlcod 0'0 remapped mbc={}] on_new_interval upacting_features 0x3f01cfbf7ffdffff from [0,3,7]+[2,3,0] -- on primary change (no longer the Primary) 2022-09-28T16:53:30.142+0000 7f806ac70700 1 osd.2 pg_epoch: 307 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=-1 lpr=307 pi=[248,307)/1 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] state<Start>: transitioning to Stray 2022-09-28T16:53:30.142+0000 7f806ac70700 20 osd.2 pg_epoch: 307 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0] r=0 lpr=307 pi=[248,306)/1 crt=0'0 mlcod 0'0 unknown mbc={}] new interval newup [2,3,0] newacting [0,3,7] 2022-09-28T16:53:30.143+0000 7f806ac70700 10 osd.2 pg_epoch: 307 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=-1 lpr=307 pi=[248,307)/1 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] do_peering_event: epoch_sent: 307 epoch_requested: 307 MQuery 3.15 from 0 query_epoch 307 query: query(info 0'0 epoch_sent 307) 2022-09-28T16:53:30.143+0000 7f806ac70700 15 osd.2 scrub-queue::remove_from_osd_queue removing pg[3.15] from OSD scrub queue 2022-09-28T16:53:30.143+0000 7f806ac70700 5 osd.2 scrub-queue::remove_from_osd_queue removing pg[3.15] failed. State was: unregistering 2022-09-28T16:53:30.143+0000 7f806ac70700 10 osd.2 pg_epoch: 307 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=-1 lpr=307 pi=[248,307)/1 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] scrubber<NotActive>: on_primary_change (from on_info_history_change (plnd:)): Replica/other. <in-qu>&<unregister> --> <in-qu>&<unregistering> 2022-09-28T16:53:30.143+0000 7f806ac70700 10 osd.2 pg_epoch: 307 pg[3.15( empty local-lis/les=0/0 n=0 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=-1 lpr=307 pi=[248,307)/1 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] sending info ----------- 2022-09-28 16:53:31.149871800+00:00 -> 1664384011.1498718 osd.0 (osd.0) 63 : cluster [DBG] 3.15 starting backfill to osd.2 from (0'0,0'0] MAX to 245'389 2022-09-28 16:53:36.863501000+00:00 -> 1664384016.863501 osd.0 (osd.0) 65 : cluster [DBG] 3.15 scrub starts OSD 0 - the scrubbing primary ----------------------------- 2022-09-28T16:53:30.141+0000 7fdd109f4700 10 osd.0 pg_epoch: 307 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=15 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=306) [2,3,0]/[0,3,7] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 remapped NOTIFY mbc={} ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] start_peering_interval: check_new_interval output: check_new_interval interval(306-306 up [2,3,0](2) acting [2,3,0](2)) up_thru 300 up_from 12 l\ ast_epoch_clean 249 interval(306-306 up [2,3,0](2) acting [2,3,0](2)) : primary up 12-300 does not include interval -- 2022-09-28T16:53:30.142+0000 7fdd109f4700 10 osd.0 pg_epoch: 307 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=15 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 remapped NOTIFY mbc={} ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] scrubber<NotActive>: on_primary_change (from on_info_history_change (plnd:)): Primary. <in-qu>&<registered> --> <in-qu>&<registered> 2022-09-28T16:53:30.142+0000 7fdd109f4700 1 osd.0 pg_epoch: 307 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=15 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 remapped NOTIFY mbc={} ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] start_peering_interval up [2,3,0] -> [2,3,0], acting [2,3,0] -> [0,3,7], acting_primary 2 -> 0, up_primary 2 -> 2, role 2 -> 0, features acting \ 4540138320759226367 upacting 4540138320759226367 -- pool change? 2022-09-28T16:53:30.142+0000 7fdd109f4700 10 osd.0 pg_epoch: 307 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=15 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 remapped mbc={} ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] plpg_on_pool_change -- 2022-09-28T16:53:31.147+0000 7fdd109f4700 10 osd.0 pg_epoch: 308 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=248/249 n=15 ec=52/20 lis/c=248/248 les/c/f=249/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 remapped mbc={} ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] state<Started/Primary/Active>: In Active, about to call activate -- 2022-09-28T16:53:31.167+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 308 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={} trimq=[e2~1] ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] update_calc_stats actingset 0,3,7 upset 0,2,3 acting_recovery_backfill 0,2,3,7 2022-09-28T16:53:31.167+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 308 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={} trimq=[e2~1] ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] update_calc_stats shard 0 primary objects 15 missing 0 2022-09-28T16:53:31.167+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 308 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={} trimq=[e2~1] ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] update_calc_stats shard 2 objects 0 missing 15 2022-09-28T16:53:31.167+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 308 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={} trimq=[e2~1] ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] update_calc_stats shard 3 objects 15 missing 0 2022-09-28T16:53:31.167+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 308 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={} trimq=[e2~1] ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] update_calc_stats shard 7 objects 17 missing 0 2022-09-28T16:53:31.167+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 308 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={} trimq=[e2~1] ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] update_calc_stats object_location_counts {0,3,7=15} -- 2022-09-28T16:53:31.167+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 308 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={} trimq=[e2~1] ps=[c8~1,cf~1,d1~1,d4~1,d7~1,d9~1,dc~1,e0~1]] update_calc_stats actingset 0,3,7 upset 0,2,3 acting_recovery_backfill 0,2,3,7 -- 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats actingset 0,3,7 upset 0,2,3 acting_recovery_backfill 0,2,3,7 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats shard 0 primary objects 15 missing 0 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats shard 2 objects 0 missing 15 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats shard 3 objects 15 missing 0 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats shard 7 objects 17 missing 0 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats object_location_counts {0,3,7=15} 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats missing shard 0 missing= 0 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats missing shard 3 missing= 0 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats missing shard 2 missing= 15 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats acting shard 7 missing= 0 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats degraded 0 2022-09-28T16:53:33.168+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 310 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling rops=1 mbc={} trimq=[e2~1] ps=[c8~1,dc~1,e0~1]] update_calc_stats misplaced 15 -- 2022-09-28T16:53:36.394+0000 7fdd109f4700 10 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped+backfilling mbc={255={}} trimq=[e2~1,e4~3] ps=[c8~1,dc~1,e0~1]] needs_recovery is recovered 2022-09-28T16:53:36.394+0000 7fdd109f4700 10 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={255={}} trimq=[e2~1,e4~3] ps=[c8~1,dc~1,e0~1]] recovery done, backfill done 2022-09-28T16:53:36.394+0000 7fdd109f4700 10 osd.0 313 do_recovery started 0/1 on pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/248 les/c/f=308/249/0 sis=307) [2,3,0]/[0,3,7] backfill=[2] r=0 lpr=307 pi=[248,307)/1 crt=245'389 lcod 234'387 mlcod 0'0 active+remapped mbc={255={}} trimq=[e2~1,e4~3] ps=[c8~1,dc~1,e0~1]] -- 2022-09-28T16:53:36.395+0000 7fdd161ff700 20 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped trimq=[e2~1,e4~3] ps=[c8~1,dc~1,e0~1]] update_calc_stats actingset 0,3,7 upset 0,2,3 acting_recovery_backfill 0,2,3,7 -- and now we are clean enough to scrub: 2022-09-28T16:53:36.399+0000 7fdd0c9ec700 10 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] snap_trimmer complete 2022-09-28T16:53:36.862+0000 7fdd2ac3b700 15 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] sched_scrub pg(3.15) <active> <clean> 2022-09-28T16:53:36.862+0000 7fdd2ac3b700 10 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] validate_scrub_mode pg: 3.15 allow: 1/0 deep errs: 0 auto-repair: 0 (0) 2022-09-28T16:53:36.862+0000 7fdd2ac3b700 10 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] is_time_for_deep: need-auto? false allowed? true/false deep-errors? false last_deep_scrub_stamp 2022-09-28T16:47:52.198294+0000 2022-09-28T16:53:36.862+0000 7fdd2ac3b700 20 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] validate_periodic_mode: allowed:true/false t.f.d:false req:(plnd:) 2022-09-28T16:53:36.862+0000 7fdd2ac3b700 20 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] validate_periodic_mode will do shallow scrub (time_for_deep = 0) -- 404650:2022-09-28T16:53:36.862+0000 7fdd109f4700 10 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped+scrubbing [ 3.15: ] ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] scrubber<ReservingReplicas>: reserve_replicas 2404652:2022-09-28T16:53:36.862+0000 7fdd109f4700 10 osd.0 ep: 313 scrubber::ReplicaReservations pg[3.15]: ReplicaReservations: reserve 3 2404663:2022-09-28T16:53:36.862+0000 7fdd109f4700 10 osd.0 ep: 313 scrubber::ReplicaReservations pg[3.15]: ReplicaReservations: reserve 7 2022-09-28T16:53:36.867+0000 7fdd0c9ec700 15 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped+scrubbing [ 3.15: ] ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] scrubber<Act/WaitReplicas>: b.e.: match_in_shards: auth_list: [0, 3, 7] #: 3; obj-errs#: 0 2022-09-28T16:53:36.867+0000 7fdd0c9ec700 15 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped+scrubbing [ 3.15: ] ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] scrubber<Act/WaitReplicas>: b.e.: inconsistents: object errors #: 0 auth list #: 3 cur_missing #: 1 cur_incon #: 0 2022-09-28T16:53:36.867+0000 7fdd0c9ec700 -1 log_channel(cluster) log [ERR] : 3.15 shard 2 3:a85a0636:::smithi110102413-602:head : missing 2022-09-28T16:53:36.867+0000 7fdd0c9ec700 20 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped+scrubbing [ 3.15: ] ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] scrubber<Act/WaitReplicas>: b.e.: select_auth_object: 3:a88a6498:::smithi110102413-164:head shard 0 got:{shard-usable: soid:3:a88a6498:::smithi110102413-164:head {txt:} } we did not ask from [2]! 2022-09-28T16:53:36.864+0000 7fdd0c9ec700 10 osd.0 pg_epoch: 313 pg[3.15( v 245'389 (230'383,245'389] local-lis/les=307/308 n=15 ec=52/20 lis/c=307/307 les/c/f=308/313/0 sis=307) [2,3,0]/[0,3,7] r=0 lpr=307 crt=245'389 lcod 234'387 mlcod 0'0 active+clean+remapped+scrubbing [ 3.15: ] ps=[c8~1,dc~1,e0~1,e2~1,e4~3]] scrubber<Act/WaitLastUpdate>: _request_scrub_map scrubmap from osd.3 shallow 2
Updated by Neha Ojha over 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 48347
Updated by Ronen Friedman over 1 year ago
- Status changed from Fix Under Review to Pending Backport
Updated by Neha Ojha over 1 year ago
- Backport changed from Pacific, Quincy to pacific,quincy
Updated by Neha Ojha about 1 year ago
- Copied to Backport #58636: quincy: osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas added
Updated by Neha Ojha about 1 year ago
- Copied to Backport #58637: pacific: osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas added
Updated by Neha Ojha about 1 year ago
- Status changed from Pending Backport to Resolved