Project

General

Profile

Actions

Bug #65399

open

osd crash due to deferred recovery

Added by Xuehan Xu 26 days ago. Updated 21 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Crimson OSD will fail if a recovery op is finished after a recovery/backfill is deferred:

DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - OSD::handle_peering_op: 3.2 from 1
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): starting start_pg_operation
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): start_pg_operation in await_active stage
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): start_pg_operation active, entering await_map
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): start_pg_operation await_map stage
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): got map 50, entering get_pg_mapping
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): can_create=true, target-core=0
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): entering create_or_wait_pg
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): have_pg
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - 0x0 PeeringEvent<T>::with_pg: start
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - 0x3000022b7f08 PeeringEvent<T>::with_pg: 0x3000022b7f08 peering_event(id=62028, detail=PeeringEvent(from=1 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0)): pg present
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd - do_peering_event handling epoch_sent: 50 epoch_requested: 50 DeferRecovery: delay 0 for pg: 3.2
DEBUG 2024-04-10 03:42:04,945 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovering+undersized+remapped state<Started/Primary/Active/Recovering>: defer recovery, retry delay 0
INFO  2024-04-10 03:42:04,945 [shard 0:main] osd - PerShardState::start_operation, peering_event(id=62029, detail=PeeringEvent(from=0 pgid=3.2 sent=50 requested=50 evt=epoch_sent: 50 epoch_requested: 50 DoRecovery))
INFO  2024-04-10 03:42:04,945 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+recovering+undersized+remapped exit Started/Primary/Active/Recovering 0.055379 1 0.000195
INFO  2024-04-10 03:42:04,945 [shard 0:main] osd - Exiting state: Started/Primary/Active/Recovering, entered at 1712720524.8898933, 0.000195375 spent on 1 events
INFO  2024-04-10 03:42:04,945 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+remapped enter Started/Primary/Active/NotRecovering
INFO  2024-04-10 03:42:04,945 [shard 0:main] osd - Entering state: Started/Primary/Active/NotRecovering

.......
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): starting start_pg_operation
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): starting start_pg_operation
DEBUG 2024-04-10 03:42:04,946 [shard 1:main] osd - handle_recovery_delete: MOSDPGRecoveryDelete(3.1 e50,49 3:8f84cbc9:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object23323:head,47'7785) v2
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): start_pg_operation in await_active stage
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): start_pg_operation in await_active stage
DEBUG 2024-04-10 03:42:04,946 [shard 1:main] osd - local_recover_delete: 3:8f84cbc9:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object23323:head, 47'7785
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): start_pg_operation active, entering await_map
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): start_pg_operation await_map stage
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): got map 50, entering get_pg_mapping
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): can_create=false, target-core=0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): entering create_or_wait_pg
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): have_pg
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - 0x300002cb2f88 RecoverySubRequest::with_pg: RecoverySubRequest::with_pg: background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044))
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - on_peer_recover: 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head, 45'7044 on 1
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats actingset 0,3 upset 0,1,3 acting_recovery_backfill 0,1,3
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats shard 0 primary objects 1552 missing 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats shard 1 objects 5481 missing 631
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats shard 3 objects 0 missing 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats object_location_counts {0,3=1552}
DEBUG 2024-04-10 03:42:04,946 [shard 1:main] osd - load_metadata: object 3:8f84cbc9:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object23323:head and snapset 0=[]:{} present
DEBUG 2024-04-10 03:42:04,946 [shard 1:main] osd - load_metadata: object 3:8f84cbc9:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object23323:head and snapset 0=[]:{} present
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats missing shard 0 missing= 0
DEBUG 2024-04-10 03:42:04,946 [shard 1:main] osd - ReplicatedRecoveryBackend::local_recover_delete: do_transaction...
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats missing shard 3 missing= 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats missing shard 1 missing= 631
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats degraded 631
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats misplaced 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::prepare_stats_for_publish reporting purged_snaps []
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::prepare_stats_for_publish publish_stats_to_osd 50:2870
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - 0x300002cb2f88 RecoverySubRequest::with_pg: background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): complete
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd - background_recovery_sub(id=62032, detail=MOSDPGRecoveryDeleteReply(3.2 e50,49 3:4443594d:::benchmark_data_scephqa03.cpp.bjat.qianxin-in_886026_object12779:head,45'7044)): exit

.......
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats actingset 0,3 upset 0,1,3 acting_recovery_backfill 0,1,3
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats shard 0 primary objects 1552 missing 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats shard 1 objects 5481 missing 630
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats shard 3 objects 0 missing 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats object_location_counts {0,3=1552}
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats missing shard 0 missing= 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats missing shard 3 missing= 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats missing shard 1 missing= 630
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats degraded 630
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::update_calc_stats misplaced 0
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::prepare_stats_for_publish reporting purged_snaps []
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::prepare_stats_for_publish publish_stats_to_osd 50:2871
DEBUG 2024-04-10 03:42:04,946 [shard 0:main] osd -  pg_epoch 50 pg[3.2( v 47'7674 (0'0,47'7674] local-lis/les=49/50 n=1552 ec=15/15 lis/c=49/36 les/c/f=50/37/0 sis=49) [1,0,3]/[0,3] async=[1] r=0 lpr=49 pi=[36,49)/1 crt=47'7675 lcod 47'7672 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::needs_recovery osd.1 has 630 missing
ceph-osd: /da1/xxh/rpmbuild/BUILD/ceph-19.0.0-2535-g396c607087a/src/crimson/osd/pg_recovery.cc:45: PGRecovery::interruptible_future<bool> PGRecovery::start_recovery_ops(crimson::AggregateBlockingEvent<crimson::BlockerT<RecoveryBackend::WaitForObjectRecovery>::BlockingEvent>::TriggerI&, size_t): Assertion `pg->is_recovering()' failed.
Aborting on shard 0.
Backtrace:
 0# gsignal in /lib64/libc.so.6
 1# abort in /lib64/libc.so.6
 2# 0x00002B054F3EDC89 in /lib64/libc.so.6
 3# 0x00002B054F4133A6 in /lib64/libc.so.6
 4# PGRecovery::start_recovery_ops(crimson::AggregateBlockingEvent<crimson::BlockerT<RecoveryBackend::WaitForObjectRecovery>::BlockingEvent>::TriggerI&, unsigned long) in ceph-osd
 5# crimson::osd::PglogBasedRecovery::do_recovery() in ceph-osd
 6# auto crimson::interruptible::internal::call_with_interruption_impl<crimson::osd::IOInterruptCondition, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1}::operator()() const::{lambda()#1}>(seastar::lw_shared_ptr<{lambda()#1}>, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1}::operator()() const::{lambda()#1}&&) in ceph-osd
 7# auto crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>::with_interruption_cond<crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1}::operator()() const::{lambda()#1}, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent>(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent) const::crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger::operator()() const::{lambda(std::__exception_ptr::exception_ptr)#1}>(crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1}::operator()() const::{lambda()#1}, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent>(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent) const::crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger::operator()() const::{lambda(std::__exception_ptr::exception_ptr)#1}&&, crimson::osd::IOInterruptCondition&&) in ceph-osd
 8# seastar::future<void> crimson::osd::OperationThrottler::with_throttle_while<crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1} const&>({lambda()#1}*, crimson::osd::scheduler::params_t, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1} const&) in ceph-osd
 9# seastar::continuation<seastar::internal::promise_base_with_type<void>, crimson::osd::OperationThrottler::with_throttle_while<crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1} const&>({lambda()#1}*, crimson::osd::scheduler::params_t, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1} const&)::{lambda(bool)#1}, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1}<bool>::then_impl_nrvo<crimson::osd::scheduler::params_t, crimson::osd::BackgroundRecoveryT<crimson::osd::PglogBasedRecovery>::start()::{lambda()#1}::operator()() const::{lambda(auto:1&&)#1}::operator()<crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery> >(crimson::BlockerT<crimson::osd::OperationThrottler>::BlockingEvent::Trigger<crimson::osd::PglogBasedRecovery>) const::{lambda()#1} const>(crimson::osd::scheduler::params_t)::{lambda(seastar::internal::promise_base_with_type<void>&&, crimson::osd::scheduler::params_t&, seastar::future_state<bool>&&)#1}, bool>::run_and_dispose() in ceph-osd
10# 0x0000000002E7D5B8 in ceph-osd
11# 0x0000000002E7D9D4 in ceph-osd
12# 0x0000000002EBF006 in ceph-osd
13# 0x0000000002EBFD4D in ceph-osd
14# 0x0000000002E0DEBD in ceph-osd
15# 0x0000000002E0E745 in ceph-osd
16# main in ceph-osd
17# __libc_start_main in /lib64/libc.so.6
18# _start in ceph-osd

Actions #1

Updated by Xuehan Xu 26 days ago

  • Pull request ID set to 56806
Actions #2

Updated by Matan Breizman 21 days ago

  • Status changed from New to Fix Under Review
Actions

Also available in: Atom PDF