Project

General

Profile

Actions

Bug #9808

closed

PG stuck in active+undersized+degraded+remapped+backfill_toofull

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Steps to reproduce

  • modify vstart.sh with
    @@ -337,6 +337,9 @@ if [ "$start_mon" -eq 1 ]; then
             osd pg bits = 3
             osd pgp bits = 5  ; (invalid, but ceph should cope!)
             osd crush chooseleaf type = 0
    +        osd pool default size = 2
    +        osd_max_backfills = 2
    +        osd min pg log entries = 5
             osd pool default min size = 1
             osd pool default erasure code directory = .libs
             osd pool default erasure code profile = plugin=jerasure technique=reed_sol_van k=2 m=1 ruleset-failure-domain=osd
    
  • rm -fr dev out ; mkdir -p dev ; MON=1 OSD=3 ./vstart.sh -d -n -l mon osd
  • rados -p rbd bench 1 write -b 4096 --no-cleanup
  • ceph tell osd.* injectargs -- --osd_backfill_full_ratio 0
  • pkill -f 'ceph-osd -i 0'
  • ceph osd out 0
  • sleep 30
  • ceph tell osd.* injectargs -- --osd_backfill_full_ratio 0.8
  • sleep 30
  • ceph pg dump
    ...
    pg_stat    objects    mip    degr    misp    unf    bytes    log    disklog    state    state_stamp    v    reported    up    up_primary    acting    acting_primary    last_scrub    scrub_stamp    last_deep_scrub    deep_scrub_stamp
    0.3    9    0    9    9    0    36864    6    6    active+undersized+degraded+remapped    2014-10-17 16:02:23.710060    11'9    16:34    [2,1]    2    [2]    2    0'0    2014-10-17 16:01:48.864991    0'0    2014-10-17 16:01:48.864991
    ...
    

It should notice that the full_ratio has changed back to 0.8 but does not for some reason

$ ceph daemon osd.2 config get osd_backfill_full_ratio
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
{ "osd_backfill_full_ratio": "0.8"}
loic@fold:~/software/ceph/ceph/src$ ceph daemon osd.1 config get osd_backfill_full_ratio
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
{ "osd_backfill_full_ratio": "0.8"}


Files

session.txt (27.7 KB) session.txt shell session of the above steps with output Loïc Dachary, 10/17/2014 04:13 PM
osd.1.log (61.4 KB) osd.1.log tail of the osd.1 log Loïc Dachary, 10/17/2014 04:13 PM
Actions #1

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #2

Updated by Loïc Dachary over 9 years ago

The scheduled RequestBackfill happens as expected

  16251:2014-10-17 16:05:43.795615 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
  16287:2014-10-17 16:05:44.155776 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
  16312:2014-10-17 16:05:44.324996 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
  16471:2014-10-17 16:05:53.797762 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
  16507:2014-10-17 16:05:54.157738 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
  16532:2014-10-17 16:05:54.328097 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
  16710:2014-10-17 16:06:03.801028 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
  16746:2014-10-17 16:06:04.161021 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.6( v 11'8 (11'1,11'8] local-les=16 n=8 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'1 lcod 11'7 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill
  16771:2014-10-17 16:06:04.331050 7f0aac0df700 10 osd.1 pg_epoch: 16 pg[0.7( v 11'16 (11'10,11'16] local-les=16 n=16 ec=1 les/c 16/13 14/15/8) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'10 lcod 11'15 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill

Actions #3

Updated by Loïc Dachary over 9 years ago

  • Status changed from New to Rejected

The disk was 90% full ... hence the block.

Actions

Also available in: Atom PDF