Actions
Bug #9808
closedPG stuck in active+undersized+degraded+remapped+backfill_toofull
Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Steps to reproduce
- modify vstart.sh with
@@ -337,6 +337,9 @@ if [ "$start_mon" -eq 1 ]; then osd pg bits = 3 osd pgp bits = 5 ; (invalid, but ceph should cope!) osd crush chooseleaf type = 0 + osd pool default size = 2 + osd_max_backfills = 2 + osd min pg log entries = 5 osd pool default min size = 1 osd pool default erasure code directory = .libs osd pool default erasure code profile = plugin=jerasure technique=reed_sol_van k=2 m=1 ruleset-failure-domain=osd
- rm -fr dev out ; mkdir -p dev ; MON=1 OSD=3 ./vstart.sh -d -n -l mon osd
- rados -p rbd bench 1 write -b 4096 --no-cleanup
- ceph tell osd.* injectargs -- --osd_backfill_full_ratio 0
- pkill -f 'ceph-osd -i 0'
- ceph osd out 0
- sleep 30
- ceph tell osd.* injectargs -- --osd_backfill_full_ratio 0.8
- sleep 30
- ceph pg dump
... pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 0.3 9 0 9 9 0 36864 6 6 active+undersized+degraded+remapped 2014-10-17 16:02:23.710060 11'9 16:34 [2,1] 2 [2] 2 0'0 2014-10-17 16:01:48.864991 0'0 2014-10-17 16:01:48.864991 ...
It should notice that the full_ratio has changed back to 0.8 but does not for some reason
$ ceph daemon osd.2 config get osd_backfill_full_ratio *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** { "osd_backfill_full_ratio": "0.8"} loic@fold:~/software/ceph/ceph/src$ ceph daemon osd.1 config get osd_backfill_full_ratio *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** { "osd_backfill_full_ratio": "0.8"}
Files
Updated by Loïc Dachary over 9 years ago
The scheduled RequestBackfill happens as expected
16251:2014-10-17 16:05:43.795615 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16287:2014-10-17 16:05:44.155776 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16312:2014-10-17 16:05:44.324996 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16471:2014-10-17 16:05:53.797762 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16507:2014-10-17 16:05:54.157738 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16532:2014-10-17 16:05:54.328097 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16710:2014-10-17 16:06:03.801028 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16746:2014-10-17 16:06:04.161021 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16771:2014-10-17 16:06:04.331050 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
Updated by Loïc Dachary over 9 years ago
- Status changed from New to Rejected
The disk was 90% full ... hence the block.
Actions