Actions
Bug #10272
closedobjects misplaced after reweight
Status:
Rejected
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Steps to reproduce, after compiling from sources:
$ ./ceph --version ceph version 0.89-387-gf4735cf (f4735cfff176b6e8ca96a4cdca946d2066a6e932) $ MON=1 OSD=10 ./vstart.sh -X -n -l mon osd $ ./rados -p rbd bench 300 write --run-name backfill2 --no-cleanup $ ./ceph osd reweight 0 1 $ ./ceph osd reweight 1 0.8 $ ./ceph osd reweight 2 0.7 $ ./ceph osd reweight 3 0.2 $ ./ceph osd reweight 4 0.2 $ ./ceph osd reweight 5 0.2 $ ./ceph osd reweight 6 0.2 $ ./ceph osd reweight 7 0.2 $ ./ceph osd reweight 8 0.2 $ ./ceph osd reweight 9 0.2
wait for a few minutes and
$ ./ceph -s *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** cluster c4726861-63e0-43e5-9a75-e46f2a3b3a67 health HEALTH_WARN 1 pgs stuck unclean recovery 144/3432 objects degraded (4.196%) recovery 144/3432 objects misplaced (4.196%) too few PGs per OSD (2 < min 3) monmap e1: 1 mons at {a=127.0.0.1:6789/0} election epoch 2, quorum 0 a osdmap e43: 10 osds: 10 up, 10 in pgmap v324: 8 pgs, 1 pools, 4572 MB data, 1144 objects 933 GB used, 1281 GB / 2215 GB avail 144/3432 objects degraded (4.196%) 144/3432 objects misplaced (4.196%) 1 active+remapped 7 active+clean $ ./ceph pg dump *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** dumped all in format plain version 324 stamp 2014-12-08 10:30:09.614597 last_osdmap_epoch 43 last_pg_scan 1 full_ratio 0.99 nearfull_ratio 0.85 pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_s$ 0.3 129 0 0 0 0 541065216 129 129 active+clean 2014-12-08 10:24:45.367846 31'129 43:475 [2,0,1] 2 [2,0,1] 2 0'0 2014-12-08 10:11:54.774932 $ 0.2 150 0 0 0 0 629145600 150 150 active+clean 2014-12-08 10:24:08.414076 31'150 43:630 [3,4,0] 3 [3,4,0] 3 0'0 2014-12-08 10:11:54.774923 $ 0.1 134 0 0 0 0 562036736 134 134 active+clean 2014-12-08 10:23:46.593751 31'134 43:478 [8,6,0] 8 [8,6,0] 8 0'0 2014-12-08 10:11:54.774909 $ 0.0 144 0 144 144 0 603979776 144 144 active+remapped 2014-12-08 10:25:10.606244 31'144 43:448 [1,0] 1 [1,0,3] 1 0'0 2014-12-08 10:11:54.774889 $ 0.7 149 0 0 0 0 624951296 149 149 active+clean 2014-12-08 10:25:50.085911 31'149 43:492 [0,1,2] 0 [0,1,2] 0 0'0 2014-12-08 10:11:54.774967 $ 0.6 149 0 0 0 0 624951296 149 149 active+clean 2014-12-08 10:25:22.777091 31'149 43:743 [0,7,2] 0 [0,7,2] 0 0'0 2014-12-08 10:11:54.774959 $ 0.5 137 0 0 0 0 574619648 137 137 active+clean 2014-12-08 10:24:53.794016 31'137 43:540 [5,2,1] 5 [5,2,1] 5 0'0 2014-12-08 10:11:54.774949 $ 0.4 152 0 0 0 0 633339916 152 152 active+clean 2014-12-08 10:24:28.495426 31'152 43:592 [0,2,1] 0 [0,2,1] 0 0'0 2014-12-08 10:11:54.774941 $ pool 0 1144 0 144 0 4794089484 1144 1144 sum 1144 0 144 0 4794089484 1144 1144 osdstat kbused kbavail kb hb in hb out 0 97879344 134383748 232279476 [1,2,3,4,5,6,7,8,9] [] 1 97877480 134385612 232279476 [0,2,3,4,5,6,7,8,9] [] 2 97876416 134386676 232279476 [0,1,3,4,5,6,7,8,9] [] 3 97879984 134383108 232279476 [0,1,2,4,5,6,7,8,9] [] 4 97876800 134386292 232279476 [0,1,2,3,5,6,7,8,9] [] 5 97876504 134386588 232279476 [0,1,2,3,4,6,7,8,9] [] 6 97877184 134385908 232279476 [0,1,2,3,4,5,7,8,9] [] 7 97877052 134386040 232279476 [0,1,2,3,4,5,6,8,9] [] 8 97879160 134383932 232279476 [0,1,2,3,4,5,6,7,9] [] 9 97877472 134385620 232279476 [0,1,2,3,4,5,6,7,8] [] sum 978777396 1343853524 2322794760
no progress after.
Updated by Samuel Just over 9 years ago
This is a problem with the crush rule. Crush retried a bunch of times, but was unable to get 3 replicas for that pg.
Updated by Sage Weil over 9 years ago
- Status changed from 12 to Rejected
problem is the (post-crush) reweights. you're rejecting almost all osds with 80% probability. eventually crush will give up
Actions