Bug #38034
closedpg stuck in backfill_wait with plenty of disk space
Added by Sage Weil over 5 years ago. Updated almost 5 years ago.
0%
Description
[root@smithi104 sage]# ceph pg ls backfill_wait PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 3.32 7 0 0 0 0 0 active+remapped+backfill_wait 78m 93'133 5675:5291 [6,7]p6 [4,7]p4 2019-01-24 11:08:16.249016 2019-01-24 11:08:16.249016 [root@smithi104 sage]# ceph osd df tree 2019-01-24 12:30:16.774 7fb76c0c7700 -1 WARNING: all dangerous and experimental features are enabled. 2019-01-24 12:30:16.826 7fb76c0c7700 -1 WARNING: all dangerous and experimental features are enabled. ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 0.70312 - 630 GiB 12 GiB 5.5 GiB 16 KiB 7.0 GiB 618 GiB 1.98 1.00 - root default -5 0.35156 - 270 GiB 5.7 GiB 2.7 GiB 16 KiB 3.0 GiB 264 GiB 2.11 1.06 - host smithi104 0 ssd 0.08789 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up osd.0 1 ssd 0.08789 1.00000 90 GiB 1.1 GiB 68 MiB 8 KiB 1024 MiB 89 GiB 1.19 0.60 33 up osd.1 2 ssd 0.08789 1.00000 90 GiB 1.7 GiB 712 MiB 8 KiB 1024 MiB 88 GiB 1.88 0.95 34 up osd.2 3 ssd 0.08789 1.00000 90 GiB 2.9 GiB 1.9 GiB 0 B 1 GiB 87 GiB 3.25 1.64 39 up osd.3 -3 0.35156 - 360 GiB 6.8 GiB 2.8 GiB 0 B 4 GiB 353 GiB 1.89 0.95 - host smithi131 4 ssd 0.08789 1.00000 90 GiB 1.1 GiB 68 MiB 0 B 1 GiB 89 GiB 1.19 0.60 42 up osd.4 5 ssd 0.08789 1.00000 90 GiB 2.3 GiB 1.3 GiB 0 B 1 GiB 88 GiB 2.56 1.29 36 up osd.5 6 ssd 0.08789 1.00000 90 GiB 1.7 GiB 722 MiB 0 B 1 GiB 88 GiB 1.89 0.96 33 up osd.6 7 ssd 0.08789 1.00000 90 GiB 1.7 GiB 731 MiB 0 B 1 GiB 88 GiB 1.90 0.96 19 up osd.7 TOTAL 630 GiB 12 GiB 5.5 GiB 16 KiB 7.0 GiB 618 GiB 1.98 MIN/MAX VAR: 0.60/1.64 STDDEV: 0.68
/a/sage-2019-01-24_01:57:16-rados-wip-sage2-testing-2019-01-23-1754-distro-basic-smithi/3499356
Updated by David Zafman over 5 years ago
Analysis so far:
Maybe this is a backfill preemption issue. The pg is in backfill_wait state after getting RemoteReservationRevoked, so every time PrimaryLogPG::start_recovery_ops() is called "recovery raced and were queued twice, ignoring!" is output and the function returns. This went on for 20 minutes.
2019-01-24 11:10:37.827 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfilling mbc={255={}}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RemoteReservationRevoked 2019-01-24 11:10:37.827 7f5b22d2f700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait+backfilling mbc={255={}}] exit Started/Primary/Active/Backfilling 0.011095 8 0.000173 2019-01-24 11:10:37.827 7f5b22d2f700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] enter Started/Primary/Active/WaitLocalBackfillReserved 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats actingset 4,7 upset 6,7 acting_recovery_backfill 4,6,7 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats acting [4,7] up [6,7] 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 4 primary objects 7 missing 0 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 6 objects 7 missing 0 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 7 objects 7 missing 0 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats missing shard 6 missing= 0 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats missing shard 7 missing= 0 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats acting shard 4 missing= 0 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats degraded 0 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats misplaced 0 2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] publish_stats_to_osd reporting purged_snaps [] 2019-01-24 11:10:37.827 7f5b22d2f700 15 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] publish_stats_to_osd 3740:3390 2019-01-24 11:10:37.827 7f5b26d37700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 LocalBackfillReserved 2019-01-24 11:10:37.827 7f5b26d37700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] exit Started/Primary/Active/WaitLocalBackfillReserved 0.000143 1 0.000165 2019-01-24 11:10:37.827 7f5b26d37700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] enter Started/Primary/Active/WaitRemoteBackfillReserved 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats actingset 4,7 upset 6,7 acting_recovery_backfill 4,6,7 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats acting [4,7] up [6,7] 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 4 primary objects 7 missing 0 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 6 objects 7 missing 0 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 7 objects 7 missing 0 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats missing shard 6 missing= 0 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats missing shard 7 missing= 0 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats acting shard 4 missing= 0 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats degraded 0 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats misplaced 0 2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] publish_stats_to_osd reporting purged_snaps [] 2019-01-24 11:10:37.827 7f5b26d37700 15 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] publish_stats_to_osd 3740:3391 2019-01-24 11:10:37.827 7f5b26d37700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] state<Started/Primary/Active/WaitRemoteBackfillReserved>: react num_bytes 0 2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 3740 dequeue_op 0x55bbf3b4ec60 prio 3 cost 0 latency 0.000189 pg_backfill(finish_ack 3.32 e 3740/3740 lb MIN) v3 pg pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] 2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _handle_message: 0x55bbf3b4ec60 2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] do_backfill pg_backfill(finish_ack 3.32 e 3740/3740 lb MIN) v3 2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] finish_recovery_op MAX 2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 3740 finish_recovery_op pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] MAX dequeue=0 (1/10 rops) 2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] queue_recovery -- queuing 2019-01-24 11:10:37.831 7f5b26d37700 10 osd.4 3740 do_recovery starting 10 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] 2019-01-24 11:10:37.831 7f5b26d37700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] recovery raced and were queued twice, ignoring! ...... 2019-01-24 11:30:40.222 7f5b22d2f700 10 osd.4 pg_epoch: 4272 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] recovery raced and were queued twice, ignoring!
Updated by David Zafman over 5 years ago
Here are the enter/exit lines on the primary where we entered backfilling and then went to backfill_wait for the last time. It appears we are waiting for remote reservations to come in.
2019-01-24 11:10:37.815 7f5b26d37700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={}] enter Started/Primary/Active/WaitRemoteBackfillReserved 2019-01-24 11:10:37.816 7f5b22d2f700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={}] exit Started/Primary/Active/WaitRemoteBackfillReserved 0.001149 1 0.000146 2019-01-24 11:10:37.816 7f5b22d2f700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={}] enter Started/Primary/Active/Backfilling 2019-01-24 11:10:37.827 7f5b22d2f700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait+backfilling mbc={255={}}] exit Started/Primary/Active/Backfilling 0.011095 8 0.000173 2019-01-24 11:10:37.827 7f5b22d2f700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] enter Started/Primary/Active/WaitLocalBackfillReserved 2019-01-24 11:10:37.827 7f5b26d37700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] exit Started/Primary/Active/WaitLocalBackfillReserved 0.000143 1 0.000165 2019-01-24 11:10:37.827 7f5b26d37700 5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] enter Started/Primary/Active/WaitRemoteBackfillReserved
Updated by David Zafman over 5 years ago
I think this is where things went wrong. We've seen something like this in the past, I think. Here the osd.6 reservation was preempted. It sent the did a RemoteReservationCanceled which sends a REVOKE to the primary. The primary sends a release to all nodes, then might Request the reservation again. Here it appears that the new RequestBackfillPrio arrived before the RecoveryDone happened. Could the new messaging code cause this out of order problem?
2019-01-24 11:10:37.824 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RemoteBackfillPreempted 2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RemoteReservationCanceled 2019-01-24 11:10:37.830 7fa158fad700 5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] exit Started/ReplicaActive/RepRecovering 0.014864 2 0.000197 2019-01-24 11:10:37.830 7fa158fad700 5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] enter Started/ReplicaActive/RepNotRecovering 2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RequestBackfillPrio: priority 100 primary bytes 0 local bytes 0 2019-01-24 11:10:37.830 7fa158fad700 20 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] _tentative_full type backfillfull adjust_used 0KiB 2019-01-24 11:10:37.830 7fa158fad700 5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] exit Started/ReplicaActive/RepNotRecovering 0.000035 1 0.000013 2019-01-24 11:10:37.830 7fa158fad700 5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] enter Started/ReplicaActive/RepWaitBackfillReserved 2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 3740 dequeue_op 0x5608771f4580 prio 127 cost 0 latency 0.003988 pg_backfill(finish 3.32 e 3740/3740 lb MAX) v3 pg pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] 2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] _handle_message: 0x5608771f4580 2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_backfill pg_backfill(finish 3.32 e 3740/3740 lb MAX) v3 2019-01-24 11:10:37.830 7fa158fad700 0 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_backfill final 0 replaces local 0 2019-01-24 11:10:37.831 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RecoveryDone 2019-01-24 11:10:37.831 7fa158fad700 5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] exit Started/ReplicaActive/RepWaitBackfillReserved 0.000201 1 0.000027 2019-01-24 11:10:37.831 7fa158fad700 5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] enter Started/ReplicaActive/RepNotRecovering
Updated by David Zafman over 5 years ago
During preemption what ensures that backfill node processes the following messages from primary in order?
Primary receives preempt (RemoteReservationRevoked)
Releases local reservation
Sends RELEASE to all backfill nodes
pg->osd->send_message_osd_cluster( new MBackfillReserve( MBackfillReserve::RELEASE, spg_t(pg->info.pgid.pgid, it->shard), pg->get_osdmap_epoch()), con.get());
Primary re-aquires local reservation
Send REQUEST to backfill nodes
pg->osd->send_message_osd_cluster( new MBackfillReserve( MBackfillReserve::REQUEST, spg_t(pg->info.pgid.pgid, backfill_osd_it->shard), pg->get_osdmap_epoch(), pg->get_backfill_priority(), num_bytes, pg->peer_bytes[*backfill_osd_it]), con.get());
Updated by David Zafman about 5 years ago
I created a simple fix for this scenario in https://github.com/ceph/ceph/pull/20933. The result is probably this stuck PG in backfill_wait. My change absorbed the RecoveryDone at the replica without crashing.
Closed pull request https://github.com/ceph/ceph/pull/20837 handles this at the primary. Below is the description in the pull request.
----------
The race seires look like (S = Started, P = Primary, A = Active, R = Replica, RA = ReplicaActive):
MOSDPGBackfill::OP_BACKFILL_FINISH P(S/P/A/Backifilling) ----------------------------------> **not arrived at replica yet** R(S/RA/RepRecoverying) | | RemoteBackfillPreempted MBackfillReserve::REVOKE V P(S/P/A/Backifilling) <------------------------ R(S/RA/RepRecoverying) | | RemoteReservationRevoked V MBackfillReserve::RELEASE P(S/P/A/WaitLocalBackfillReserved) -------------------------> R(S/RA/RepNotRecovering) | | LocalBackfillReserved V MBackfillReserve::REQUEST P(S/P/A/WaitRemoteBackfillReserved) -------------------------> R(S/RA/RepWaitBackfillReserved) | **OP_BACKFILL_FINISH finally processed** | RecoveryDone V R(Crashed)
Some key points:
- backfill is nearly done; Primary begins to send out OP_BACKFILL_FINISH,
which has not been seen by Replica yet - Replica's backfill process preempted; backfill-revoke triggered by Replica
and Primary now restarts backfill process (e.g., because it
has not received all OP_BACKFILL_FINISH_ACK(s) yet);
Replica begins to wait for backfill-reservation (again) - OP_BACKFILL_FINISH finally dequeued, processed and translated into a
RecoveryDone event, which now crashes the Replica process..
(note that the OP_BACKFILL_FINISH messages can be arbitrarily delayed
because we now process peering events in the same (Op)Queue
and those events are of the highest priority)
See also:
http://pulpito.ceph.com/xxg-2018-02-28_09:02:53-rados-wip-fix-upmap-distro-basic-smithi/2235497/
Fix the above problem by rechecking backfill status each time
a RemoteReservationRevoked message is received, so we can
restart the backfill process in a more safe and less-likely raced way.
Updated by Neha Ojha about 5 years ago
- Status changed from 12 to Fix Under Review
Updated by David Zafman about 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38341: mimic: pg stuck in backfill_wait with plenty of disk space added
Updated by Nathan Cutler almost 5 years ago
- Status changed from Pending Backport to Resolved