Project

General

Profile

Bug #38034

pg stuck in backfill_wait with plenty of disk space

Added by Sage Weil 7 months ago. Updated 28 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
01/24/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

[root@smithi104 sage]# ceph pg ls backfill_wait
PG   OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE                         SINCE VERSION REPORTED  UP      ACTING  SCRUB_STAMP                DEEP_SCRUB_STAMP           
3.32       7        0         0       0     0   0 active+remapped+backfill_wait   78m  93'133 5675:5291 [6,7]p6 [4,7]p4 2019-01-24 11:08:16.249016 2019-01-24 11:08:16.249016 
[root@smithi104 sage]# ceph osd df tree
2019-01-24 12:30:16.774 7fb76c0c7700 -1 WARNING: all dangerous and experimental features are enabled.
2019-01-24 12:30:16.826 7fb76c0c7700 -1 WARNING: all dangerous and experimental features are enabled.
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP   META     AVAIL   %USE VAR  PGS STATUS TYPE NAME          
-1       0.70312        - 630 GiB  12 GiB 5.5 GiB 16 KiB  7.0 GiB 618 GiB 1.98 1.00   -        root default       
-5       0.35156        - 270 GiB 5.7 GiB 2.7 GiB 16 KiB  3.0 GiB 264 GiB 2.11 1.06   -            host smithi104 
 0   ssd 0.08789        0     0 B     0 B     0 B    0 B      0 B     0 B    0    0   0     up         osd.0      
 1   ssd 0.08789  1.00000  90 GiB 1.1 GiB  68 MiB  8 KiB 1024 MiB  89 GiB 1.19 0.60  33     up         osd.1      
 2   ssd 0.08789  1.00000  90 GiB 1.7 GiB 712 MiB  8 KiB 1024 MiB  88 GiB 1.88 0.95  34     up         osd.2      
 3   ssd 0.08789  1.00000  90 GiB 2.9 GiB 1.9 GiB    0 B    1 GiB  87 GiB 3.25 1.64  39     up         osd.3      
-3       0.35156        - 360 GiB 6.8 GiB 2.8 GiB    0 B    4 GiB 353 GiB 1.89 0.95   -            host smithi131 
 4   ssd 0.08789  1.00000  90 GiB 1.1 GiB  68 MiB    0 B    1 GiB  89 GiB 1.19 0.60  42     up         osd.4      
 5   ssd 0.08789  1.00000  90 GiB 2.3 GiB 1.3 GiB    0 B    1 GiB  88 GiB 2.56 1.29  36     up         osd.5      
 6   ssd 0.08789  1.00000  90 GiB 1.7 GiB 722 MiB    0 B    1 GiB  88 GiB 1.89 0.96  33     up         osd.6      
 7   ssd 0.08789  1.00000  90 GiB 1.7 GiB 731 MiB    0 B    1 GiB  88 GiB 1.90 0.96  19     up         osd.7      
                    TOTAL 630 GiB  12 GiB 5.5 GiB 16 KiB  7.0 GiB 618 GiB 1.98                                    
MIN/MAX VAR: 0.60/1.64  STDDEV: 0.68

/a/sage-2019-01-24_01:57:16-rados-wip-sage2-testing-2019-01-23-1754-distro-basic-smithi/3499356

Related issues

Copied to RADOS - Backport #38341: mimic: pg stuck in backfill_wait with plenty of disk space Resolved

History

#1 Updated by David Zafman 7 months ago

Analysis so far:

Maybe this is a backfill preemption issue. The pg is in backfill_wait state after getting RemoteReservationRevoked, so every time PrimaryLogPG::start_recovery_ops() is called "recovery raced and were queued twice, ignoring!" is output and the function returns. This went on for 20 minutes.

2019-01-24 11:10:37.827 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfilling mbc={255={}}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RemoteReservationRevoked
2019-01-24 11:10:37.827 7f5b22d2f700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait+backfilling mbc={255={}}] exit Started/Primary/Active/Backfilling 0.011095 8 0.000173
2019-01-24 11:10:37.827 7f5b22d2f700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] enter Started/Primary/Active/WaitLocalBackfillReserved
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats actingset 4,7 upset 6,7 acting_recovery_backfill 4,6,7
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats acting [4,7] up [6,7]
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 4 primary objects 7 missing 0
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 6 objects 7 missing 0
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 7 objects 7 missing 0
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats missing shard 6 missing= 0
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats missing shard 7 missing= 0
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats acting shard 4 missing= 0
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats degraded 0
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats misplaced 0
2019-01-24 11:10:37.827 7f5b22d2f700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] publish_stats_to_osd reporting purged_snaps []
2019-01-24 11:10:37.827 7f5b22d2f700 15 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] publish_stats_to_osd 3740:3390
2019-01-24 11:10:37.827 7f5b26d37700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 LocalBackfillReserved
2019-01-24 11:10:37.827 7f5b26d37700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] exit Started/Primary/Active/WaitLocalBackfillReserved 0.000143 1 0.000165
2019-01-24 11:10:37.827 7f5b26d37700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] enter Started/Primary/Active/WaitRemoteBackfillReserved
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats actingset 4,7 upset 6,7 acting_recovery_backfill 4,6,7
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats acting [4,7] up [6,7]
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 4 primary objects 7 missing 0
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 6 objects 7 missing 0
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats shard 7 objects 7 missing 0
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats missing shard 6 missing= 0
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats missing shard 7 missing= 0
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats acting shard 4 missing= 0
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats degraded 0
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _update_calc_stats misplaced 0
2019-01-24 11:10:37.827 7f5b26d37700 20 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] publish_stats_to_osd reporting purged_snaps []
2019-01-24 11:10:37.827 7f5b26d37700 15 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] publish_stats_to_osd 3740:3391
2019-01-24 11:10:37.827 7f5b26d37700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] state<Started/Primary/Active/WaitRemoteBackfillReserved>: react num_bytes 0
2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 3740 dequeue_op 0x55bbf3b4ec60 prio 3 cost 0 latency 0.000189 pg_backfill(finish_ack 3.32 e 3740/3740 lb MIN) v3 pg pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}]
2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] _handle_message: 0x55bbf3b4ec60
2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] do_backfill pg_backfill(finish_ack 3.32 e 3740/3740 lb MIN) v3
2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] finish_recovery_op MAX
2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 3740 finish_recovery_op pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] MAX dequeue=0 (1/10 rops)
2019-01-24 11:10:37.831 7f5b22d2f700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] queue_recovery -- queuing
2019-01-24 11:10:37.831 7f5b26d37700 10 osd.4 3740 do_recovery starting 10 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}]
2019-01-24 11:10:37.831 7f5b26d37700 10 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] recovery raced and were queued twice, ignoring!
......
2019-01-24 11:30:40.222 7f5b22d2f700 10 osd.4 pg_epoch: 4272 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] recovery raced and were queued twice, ignoring!

#2 Updated by David Zafman 7 months ago

Here are the enter/exit lines on the primary where we entered backfilling and then went to backfill_wait for the last time. It appears we are waiting for remote reservations to come in.

2019-01-24 11:10:37.815 7f5b26d37700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={}] enter Started/Primary/Active/WaitRemoteBackfillReserved
2019-01-24 11:10:37.816 7f5b22d2f700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={}] exit Started/Primary/Active/WaitRemoteBackfillReserved 0.001149 1 0.000146
2019-01-24 11:10:37.816 7f5b22d2f700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={}] enter Started/Primary/Active/Backfilling
2019-01-24 11:10:37.827 7f5b22d2f700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait+backfilling mbc={255={}}] exit Started/Primary/Active/Backfilling 0.011095 8 0.000173
2019-01-24 11:10:37.827 7f5b22d2f700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] enter Started/Primary/Active/WaitLocalBackfillReserved
2019-01-24 11:10:37.827 7f5b26d37700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] exit Started/Primary/Active/WaitLocalBackfillReserved 0.000143 1 0.000165
2019-01-24 11:10:37.827 7f5b26d37700  5 osd.4 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] backfill=[6] r=0 lpr=3739 pi=[3701,3739)/1 rops=1 crt=93'133 mlcod 0'0 active+remapped+backfill_wait mbc={255={}}] enter Started/Primary/Active/WaitRemoteBackfillReserved

#3 Updated by David Zafman 7 months ago

I think this is where things went wrong. We've seen something like this in the past, I think. Here the osd.6 reservation was preempted. It sent the did a RemoteReservationCanceled which sends a REVOKE to the primary. The primary sends a release to all nodes, then might Request the reservation again. Here it appears that the new RequestBackfillPrio arrived before the RecoveryDone happened. Could the new messaging code cause this out of order problem?

2019-01-24 11:10:37.824 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RemoteBackfillPreempted
2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RemoteReservationCanceled
2019-01-24 11:10:37.830 7fa158fad700  5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] exit Started/ReplicaActive/RepRecovering 0.014864 2 0.000197
2019-01-24 11:10:37.830 7fa158fad700  5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] enter Started/ReplicaActive/RepNotRecovering
2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RequestBackfillPrio: priority 100 primary bytes 0 local bytes 0
2019-01-24 11:10:37.830 7fa158fad700 20 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] _tentative_full type backfillfull adjust_used 0KiB
2019-01-24 11:10:37.830 7fa158fad700  5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] exit Started/ReplicaActive/RepNotRecovering 0.000035 1 0.000013
2019-01-24 11:10:37.830 7fa158fad700  5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] enter Started/ReplicaActive/RepWaitBackfillReserved
2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 3740 dequeue_op 0x5608771f4580 prio 127 cost 0 latency 0.003988 pg_backfill(finish 3.32 e 3740/3740 lb MAX) v3 pg pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}]
2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] _handle_message: 0x5608771f4580
2019-01-24 11:10:37.830 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] lb MIN (bitwise) local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_backfill pg_backfill(finish 3.32 e 3740/3740 lb MAX) v3
2019-01-24 11:10:37.830 7fa158fad700  0 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=0 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_backfill final 0 replaces local 0
2019-01-24 11:10:37.831 7fa158fad700 10 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] do_peering_event: epoch_sent: 3740 epoch_requested: 3740 RecoveryDone
2019-01-24 11:10:37.831 7fa158fad700  5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] exit Started/ReplicaActive/RepWaitBackfillReserved 0.000201 1 0.000027
2019-01-24 11:10:37.831 7fa158fad700  5 osd.6 pg_epoch: 3740 pg[3.32( v 93'133 (93'133,93'133] local-lis/les=3739/3740 n=7 ec=1769/18 lis/c 3739/3701 les/c/f 3740/3702/0 3738/3739/3739) [6,7]/[4,7] r=-1 lpr=3739 pi=[3701,3739)/1 luod=0'0 crt=93'133 active+remapped mbc={}] enter Started/ReplicaActive/RepNotRecovering

#4 Updated by David Zafman 7 months ago

During preemption what ensures that backfill node processes the following messages from primary in order?

Primary receives preempt (RemoteReservationRevoked)
Releases local reservation
Sends RELEASE to all backfill nodes

      pg->osd->send_message_osd_cluster(
        new MBackfillReserve(
          MBackfillReserve::RELEASE,
          spg_t(pg->info.pgid.pgid, it->shard),
          pg->get_osdmap_epoch()),
        con.get());

Primary re-aquires local reservation
Send REQUEST to backfill nodes

      pg->osd->send_message_osd_cluster(
        new MBackfillReserve(
        MBackfillReserve::REQUEST,
        spg_t(pg->info.pgid.pgid, backfill_osd_it->shard),
        pg->get_osdmap_epoch(),
        pg->get_backfill_priority(),
        num_bytes,
        pg->peer_bytes[*backfill_osd_it]),
      con.get());

#5 Updated by David Zafman 6 months ago

I created a simple fix for this scenario in https://github.com/ceph/ceph/pull/20933. The result is probably this stuck PG in backfill_wait. My change absorbed the RecoveryDone at the replica without crashing.

Closed pull request https://github.com/ceph/ceph/pull/20837 handles this at the primary. Below is the description in the pull request.

----------
The race seires look like (S = Started, P = Primary, A = Active, R = Replica, RA = ReplicaActive):


                       MOSDPGBackfill::OP_BACKFILL_FINISH
P(S/P/A/Backifilling)  ----------------------------------> **not arrived at replica yet**

                                                               R(S/RA/RepRecoverying)
                                                                     |
                                                                     | RemoteBackfillPreempted
                                     MBackfillReserve::REVOKE        V
P(S/P/A/Backifilling)               <------------------------  R(S/RA/RepRecoverying)
       |
       | RemoteReservationRevoked
       V                            MBackfillReserve::RELEASE
P(S/P/A/WaitLocalBackfillReserved)  -------------------------> R(S/RA/RepNotRecovering)
       |
       | LocalBackfillReserved
       V                            MBackfillReserve::REQUEST
P(S/P/A/WaitRemoteBackfillReserved) -------------------------> R(S/RA/RepWaitBackfillReserved)
                                                                     | **OP_BACKFILL_FINISH finally processed**
                                                                     | RecoveryDone
                                                                     V
                                                               R(Crashed)

Some key points:

  • backfill is nearly done; Primary begins to send out OP_BACKFILL_FINISH,
    which has not been seen by Replica yet
  • Replica's backfill process preempted; backfill-revoke triggered by Replica
    and Primary now restarts backfill process (e.g., because it
    has not received all OP_BACKFILL_FINISH_ACK(s) yet);
    Replica begins to wait for backfill-reservation (again)
  • OP_BACKFILL_FINISH finally dequeued, processed and translated into a
    RecoveryDone event, which now crashes the Replica process..
    (note that the OP_BACKFILL_FINISH messages can be arbitrarily delayed
    because we now process peering events in the same (Op)Queue
    and those events are of the highest priority)

See also:
http://pulpito.ceph.com/xxg-2018-02-28_09:02:53-rados-wip-fix-upmap-distro-basic-smithi/2235497/

Fix the above problem by rechecking backfill status each time
a RemoteReservationRevoked message is received, so we can
restart the backfill process in a more safe and less-likely raced way.


#6 Updated by David Zafman 6 months ago

  • Pull request ID set to 26375

#7 Updated by Neha Ojha 6 months ago

  • Status changed from Verified to Need Review

#8 Updated by David Zafman 6 months ago

  • Backport set to mimic

#9 Updated by David Zafman 6 months ago

  • Status changed from Need Review to Pending Backport

#10 Updated by Nathan Cutler 6 months ago

  • Copied to Backport #38341: mimic: pg stuck in backfill_wait with plenty of disk space added

#11 Updated by Nathan Cutler 28 days ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF