Project

General

Profile

Actions

Bug #10272

closed

objects misplaced after reweight

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Steps to reproduce, after compiling from sources:

$ ./ceph --version
ceph version 0.89-387-gf4735cf (f4735cfff176b6e8ca96a4cdca946d2066a6e932)
$ MON=1 OSD=10 ./vstart.sh -X -n -l mon osd
$ ./rados -p rbd bench 300 write --run-name backfill2 --no-cleanup
$ ./ceph osd reweight 0 1
$ ./ceph osd reweight 1 0.8
$ ./ceph osd reweight 2 0.7
$ ./ceph osd reweight 3 0.2
$ ./ceph osd reweight 4 0.2
$ ./ceph osd reweight 5 0.2
$ ./ceph osd reweight 6 0.2
$ ./ceph osd reweight 7 0.2
$ ./ceph osd reweight 8 0.2
$ ./ceph osd reweight 9 0.2

wait for a few minutes and
$ ./ceph -s
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
    cluster c4726861-63e0-43e5-9a75-e46f2a3b3a67
     health HEALTH_WARN
            1 pgs stuck unclean
            recovery 144/3432 objects degraded (4.196%)
            recovery 144/3432 objects misplaced (4.196%)
            too few PGs per OSD (2 < min 3)
     monmap e1: 1 mons at {a=127.0.0.1:6789/0}
            election epoch 2, quorum 0 a
     osdmap e43: 10 osds: 10 up, 10 in
      pgmap v324: 8 pgs, 1 pools, 4572 MB data, 1144 objects
            933 GB used, 1281 GB / 2215 GB avail
            144/3432 objects degraded (4.196%)
            144/3432 objects misplaced (4.196%)
                   1 active+remapped
                   7 active+clean
$ ./ceph pg dump
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
dumped all in format plain
version 324
stamp 2014-12-08 10:30:09.614597
last_osdmap_epoch 43
last_pg_scan 1
full_ratio 0.99
nearfull_ratio 0.85
pg_stat objects mip     degr    misp    unf     bytes   log     disklog state   state_stamp     v       reported        up      up_primary      acting  acting_primary  last_scrub      scrub_stamp     last_deep_s$
0.3     129     0       0       0       0       541065216       129     129     active+clean    2014-12-08 10:24:45.367846      31'129  43:475  [2,0,1] 2       [2,0,1] 2       0'0     2014-12-08 10:11:54.774932 $
0.2     150     0       0       0       0       629145600       150     150     active+clean    2014-12-08 10:24:08.414076      31'150  43:630  [3,4,0] 3       [3,4,0] 3       0'0     2014-12-08 10:11:54.774923 $
0.1     134     0       0       0       0       562036736       134     134     active+clean    2014-12-08 10:23:46.593751      31'134  43:478  [8,6,0] 8       [8,6,0] 8       0'0     2014-12-08 10:11:54.774909 $
0.0     144     0       144     144     0       603979776       144     144     active+remapped 2014-12-08 10:25:10.606244      31'144  43:448  [1,0]   1       [1,0,3] 1       0'0     2014-12-08 10:11:54.774889 $
0.7     149     0       0       0       0       624951296       149     149     active+clean    2014-12-08 10:25:50.085911      31'149  43:492  [0,1,2] 0       [0,1,2] 0       0'0     2014-12-08 10:11:54.774967 $
0.6     149     0       0       0       0       624951296       149     149     active+clean    2014-12-08 10:25:22.777091      31'149  43:743  [0,7,2] 0       [0,7,2] 0       0'0     2014-12-08 10:11:54.774959 $
0.5     137     0       0       0       0       574619648       137     137     active+clean    2014-12-08 10:24:53.794016      31'137  43:540  [5,2,1] 5       [5,2,1] 5       0'0     2014-12-08 10:11:54.774949 $
0.4     152     0       0       0       0       633339916       152     152     active+clean    2014-12-08 10:24:28.495426      31'152  43:592  [0,2,1] 0       [0,2,1] 0       0'0     2014-12-08 10:11:54.774941 $
pool 0  1144    0       144     0       4794089484      1144    1144
 sum    1144    0       144     0       4794089484      1144    1144
osdstat kbused  kbavail kb      hb in   hb out
0       97879344        134383748       232279476       [1,2,3,4,5,6,7,8,9]     []
1       97877480        134385612       232279476       [0,2,3,4,5,6,7,8,9]     []
2       97876416        134386676       232279476       [0,1,3,4,5,6,7,8,9]     []
3       97879984        134383108       232279476       [0,1,2,4,5,6,7,8,9]     []
4       97876800        134386292       232279476       [0,1,2,3,5,6,7,8,9]     []
5       97876504        134386588       232279476       [0,1,2,3,4,6,7,8,9]     []
6       97877184        134385908       232279476       [0,1,2,3,4,5,7,8,9]     []
7       97877052        134386040       232279476       [0,1,2,3,4,5,6,8,9]     []
8       97879160        134383932       232279476       [0,1,2,3,4,5,6,7,9]     []
9       97877472        134385620       232279476       [0,1,2,3,4,5,6,7,8]     []
 sum    978777396       1343853524      2322794760

no progress after.

Actions #1

Updated by Samuel Just over 9 years ago

This is a problem with the crush rule. Crush retried a bunch of times, but was unable to get 3 replicas for that pg.

Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from 12 to Rejected

problem is the (post-crush) reweights. you're rejecting almost all osds with 80% probability. eventually crush will give up

Actions #3

Updated by Loïc Dachary over 9 years ago

Of course... thanks for explaining

Actions

Also available in: Atom PDF