Bug #9808
closed
PG stuck in active+undersized+degraded+remapped+backfill_toofull
Added by Loïc Dachary over 9 years ago.
Updated over 9 years ago.
Description
Steps to reproduce
- modify vstart.sh with
@@ -337,6 +337,9 @@ if [ "$start_mon" -eq 1 ]; then
osd pg bits = 3
osd pgp bits = 5 ; (invalid, but ceph should cope!)
osd crush chooseleaf type = 0
+ osd pool default size = 2
+ osd_max_backfills = 2
+ osd min pg log entries = 5
osd pool default min size = 1
osd pool default erasure code directory = .libs
osd pool default erasure code profile = plugin=jerasure technique=reed_sol_van k=2 m=1 ruleset-failure-domain=osd
- rm -fr dev out ; mkdir -p dev ; MON=1 OSD=3 ./vstart.sh -d -n -l mon osd
- rados -p rbd bench 1 write -b 4096 --no-cleanup
- ceph tell osd.* injectargs -- --osd_backfill_full_ratio 0
- pkill -f 'ceph-osd -i 0'
- ceph osd out 0
- sleep 30
- ceph tell osd.* injectargs -- --osd_backfill_full_ratio 0.8
- sleep 30
- ceph pg dump
...
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
0.3 9 0 9 9 0 36864 6 6 active+undersized+degraded+remapped 2014-10-17 16:02:23.710060 11'9 16:34 [2,1] 2 [2] 2 0'0 2014-10-17 16:01:48.864991 0'0 2014-10-17 16:01:48.864991
...
It should notice that the full_ratio has changed back to 0.8 but does not for some reason
$ ceph daemon osd.2 config get osd_backfill_full_ratio
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
{ "osd_backfill_full_ratio": "0.8"}
loic@fold:~/software/ceph/ceph/src$ ceph daemon osd.1 config get osd_backfill_full_ratio
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
{ "osd_backfill_full_ratio": "0.8"}
Files
- Description updated (diff)
The scheduled RequestBackfill happens as expected
16251:2014-10-17 16:05:43.795615 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
16287:2014-10-17 16:05:44.155776 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
16312:2014-10-17 16:05:44.324996 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
16471:2014-10-17 16:05:53.797762 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
16507:2014-10-17 16:05:54.157738 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
16532:2014-10-17 16:05:54.328097 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
16710:2014-10-17 16:06:03.801028 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
16746:2014-10-17 16:06:04.161021 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
16771:2014-10-17 16:06:04.331050 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
- Status changed from New to Rejected
The disk was 90% full ... hence the block.
Also available in: Atom
PDF