Bug #52448
osd: pg may get stuck in backfill_toofull after backfill is interrupted due to target osd temporarily down
0%
Description
Consider a scenario:
- Data is written to a pool so one osd X is close to full but still lower than nearfool/toofull limits.
- The osd X is turned down.
- All data is delete from the pool and much (the same amount) of new data is written.
- The osd X is turned up and the backfilling is started
- The backfilling is interrupted by osd X temporarily down (in our tests it is emulated by `ceph osd down osd.X` command)
- The backfilling is resumed but gets stuck in backfill_toofull state
In the osd logs we can see that when the backfill is started for the first time:
2021-08-30 07:04:09.289 7fa7ab353640 10 osd.4 pg_epoch: 73 pg[4.0s0( v 68'5600 (68'2500,68'5600] local-lis/les=72/73 n=1870 ec=61/61 lis/c 67/61 les/c/f 68/62/0 71/72/61) [4,3,0]/[4,3,2147483647]p4(0) backfill=[0(2)] r=0 lpr=72 pi=[61,72)/1 crt=68'5600 lcod 68'5599 mlcod 0'0 remapped mbc={}] activate peer osd.0(2) 4.0s2( v 63'1865 (0'0,63'1865] local-lis/les=61/62 n=1865 ec=61/61 lis/c 67/61 les/c/f 68/62/0 71/72/61) 2021-08-30 07:04:09.289 7fa7ab353640 0 log_channel(cluster) log [DBG] : 4.0s0 starting backfill to osd.0(2) from (0'0,63'1865] MAX to 68'5600
In the RequestBackfillPrio event that the primary osd.4 sends to the target osd.0, the "local bytes" field is properly set to a non zero value, and _tentative_full calculates (using this value) that backfill may be resereved:
2021-08-30 07:04:09.617 7f4598644640 10 osd.0 pg_epoch: 73 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 72/61 les/c/f 73/62/0 71/72/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=72 pi=[61,72)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] do_peering_event: epoch_sent: 73 epoch_requested: 73 RequestBackfillPrio: priority 151 primary bytes 7839154200 local bytes 7818182680 2021-08-30 07:04:09.617 7f4598644640 10 osd.0 pg_epoch: 73 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 72/61 les/c/f 73/62/0 71/72/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=72 pi=[61,72)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] state<Started/ReplicaActive/RepNotRecovering>: react primary_num_bytes 3827712KiB local 3817472KiB pending_adjustments 10240KiB 2021-08-30 07:04:09.617 7f4598644640 20 osd.0 pg_epoch: 73 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 72/61 les/c/f 73/62/0 71/72/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=72 pi=[61,72)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] _tentative_full type backfillfull adjust_used 10240KiB 2021-08-30 07:04:09.617 7f4598644640 20 osd.0 73 compute_adjusted_ratio Before kb_used() 4866844 2021-08-30 07:04:09.617 7f4598644640 20 osd.0 73 compute_adjusted_ratio After kb_used() 4877084 2021-08-30 07:04:09.617 7f4598644640 5 osd.0 pg_epoch: 73 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 72/61 les/c/f 73/62/0 71/72/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=72 pi=[61,72)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] exit Started/ReplicaActive/RepNotRecovering 0.145555 2 0.057896 2021-08-30 07:04:09.617 7f4598644640 5 osd.0 pg_epoch: 73 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 72/61 les/c/f 73/62/0 71/72/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=72 pi=[61,72)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] enter Started/ReplicaActive/RepWaitBackfillReserved
But when the backfill is restarted (continued):
2021-08-30 07:04:23.308 7fa7ab353640 10 osd.4 pg_epoch: 78 pg[4.0s0( v 68'5600 (68'2500,68'5600] local-lis/les=77/78 n=1870 ec=61/61 lis/c 74/61 les/c/f 75/62/0 76/77/61) [4,3,0]/[4,3,2147483647]p4(0) backfill=[0(2)] r=0 lpr=77 pi=[61,77)/1 crt=68'5600 lcod 68'5599 mlcod 0'0 remapped mbc={}] activate peer osd.0(2) 4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 74/61 les/c/f 75/62/0 76/77/61) 2021-08-30 07:04:23.308 7fa7ab353640 0 log_channel(cluster) log [INF] : 4.0s0 continuing backfill to osd.0(2) from (68'2600,68'5600] MIN to 68'5600 2021-08-30 07:04:23.308 7fa7ab353640 10 osd.4 pg_epoch: 78 pg[4.0s0( v 68'5600 (68'2500,68'5600] local-lis/les=77/78 n=1870 ec=61/61 lis/c 74/61 les/c/f 75/62/0 76/77/61) [4,3,0]/[4,3,2147483647]p4(0) backfill=[0(2)] r=0 lpr=77 pi=[61,77)/1 crt=68'5600 lcod 68'5599 mlcod 0'0 remapped mbc={}] activate peer osd.0(2) is up to date, queueing in pending_activators 2021-08-30 07:04:23.308 7fa7ab353640 10 osd.4 pg_epoch: 78 pg[4.0s0( v 68'5600 (68'2500,68'5600] local-lis/les=77/78 n=1870 ec=61/61 lis/c 74/61 les/c/f 75/62/0 76/77/61) [4,3,0]/[4,3,2147483647]p4(0) backfill=[0(2)] r=0 lpr=77 pi=[61,77)/1 crt=68'5600 lcod 68'5599 mlcod 0'0 remapped mbc={}] activate peer osd.0(2) 4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 74/61 les/c/f 75/62/0 76/77/61) uptodate
In the RequestBackfillPrio event that the primary osd.4 sends to the target osd.0, the "local bytes" field is properly set to 0 this time, and _tentative_full calculates toofull backfill:
2021-08-30 07:04:23.324 7f459463c640 10 osd.0 pg_epoch: 78 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 77/61 les/c/f 78/62/0 76/77/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=77 pi=[61,77)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] do_peering_event: epoch_sent: 78 epoch_requested: 78 RequestBackfillPrio: priority 151 primary bytes 7839154200 local bytes 0 2021-08-30 07:04:23.324 7f459463c640 10 osd.0 pg_epoch: 78 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 77/61 les/c/f 78/62/0 76/77/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=77 pi=[61,77)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] state<Started/ReplicaActive/RepNotRecovering>: react primary_num_bytes 3827712KiB local 0KiB pending_adjustments 3827712KiB 2021-08-30 07:04:23.324 7f459463c640 20 osd.0 pg_epoch: 78 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 77/61 les/c/f 78/62/0 76/77/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=77 pi=[61,77)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] _tentative_full type backfillfull adjust_used 3827712KiB 2021-08-30 07:04:23.324 7f459463c640 20 osd.0 78 compute_adjusted_ratio Before kb_used() 4866904 2021-08-30 07:04:23.324 7f459463c640 20 osd.0 78 compute_adjusted_ratio After kb_used() 8694616 2021-08-30 07:04:23.324 7f459463c640 10 osd.0 pg_epoch: 78 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 77/61 les/c/f 78/62/0 76/77/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=77 pi=[61,77)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] _tentative_full tentative usage is 0.0919983 2021-08-30 07:04:23.324 7f459463c640 10 osd.0 pg_epoch: 78 pg[4.0s2( v 68'5600 (68'2600,68'5600] lb MIN (bitwise) local-lis/les=72/73 n=0 ec=61/61 lis/c 77/61 les/c/f 78/62/0 76/77/61) [4,3,0]/[4,3,2147483647]p4(0) r=-1 lpr=77 pi=[61,77)/1 luod=0'0 crt=68'5600 lcod 0'0 active+remapped mbc={}] state<Started/ReplicaActive/RepNotRecovering>: backfill reservation rejected: backfill full
In the osd code we can see that when sending BackfillReserve request, for "local bytes" the primary uses value from peer_bytes array [1], which is cached when PeeringState::activate is called and starting backfill [2]. But peer_bytes array may be reset later, when clear_primary_state is called [3]. And when PeeringState::activate is called again and continuing backfill [4] peer_bytes is empty and "local bytes" is 0 in BackfillReserve request.
[1] https://github.com/ceph/ceph/blob/0917e204e9367cd31dbf25884fb32c74cb2e775a/src/osd/PeeringState.cc#L5106
[2] https://github.com/ceph/ceph/blob/0917e204e9367cd31dbf25884fb32c74cb2e775a/src/osd/PeeringState.cc#L2792
[3] https://github.com/ceph/ceph/blob/0917e204e9367cd31dbf25884fb32c74cb2e775a/src/osd/PeeringState.cc#L892
[4] https://github.com/ceph/ceph/blob/0917e204e9367cd31dbf25884fb32c74cb2e775a/src/osd/PeeringState.cc#L2745
Related issues
History
#1 Updated by Mykola Golub over 2 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 42964
#2 Updated by Mykola Golub over 2 years ago
- Backport set to pacific,octopus,nautilus
#3 Updated by Mykola Golub over 2 years ago
- Status changed from Fix Under Review to Pending Backport
#4 Updated by Backport Bot over 2 years ago
- Copied to Backport #52831: pacific: osd: pg may get stuck in backfill_toofull after backfill is interrupted due to target osd temporarily down added
#5 Updated by Backport Bot over 2 years ago
- Copied to Backport #52832: nautilus: osd: pg may get stuck in backfill_toofull after backfill is interrupted due to target osd temporarily down added
#6 Updated by Backport Bot over 2 years ago
- Copied to Backport #52833: octopus: osd: pg may get stuck in backfill_toofull after backfill is interrupted due to target osd temporarily down added
#7 Updated by Backport Bot over 1 year ago
- Tags set to backport_processed
#8 Updated by Konstantin Shalygin over 1 year ago
- Status changed from Pending Backport to Resolved