Actions
Bug #7328
closedosd: reweight-by-utilization ended up with stuck remapped pgs
% Done:
0%
Source:
Support
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Running ceph osd reweight-by-utilization resulted in stuck pgs.
health HEALTH_WARN 204 pgs stuck unclean; recovery 4136/163619494 objects degraded (0.003%) monmap e1: 1 mons at {cs-compute03=192.168.181.13:6789/0}, election epoch 1, quorum 0 cs-compute03 osdmap e2996: 120 osds: 120 up, 120 in pgmap v665045: 6128 pgs, 4 pools, 164 TB data, 79892 kobjects 329 TB used, 106 TB / 435 TB avail 4136/163619494 objects degraded (0.003%) 5917 active+clean 204 active+remapped
I believe the cause is related to the crush configuration. 2 copies split across 2 rooms. Weighting the OSDs back up to 1 resolves the remapped pgs.
root root2 { id -7 # do not change unnecessarily # weight 435.600 alg straw hash 0 # rjenkins1 item room1 weight 217.800 item room2 weight 217.800 } rule testrule { ruleset 3 type replicated min_size 2 max_size 4 step take root2 step chooseleaf firstn 0 type room step emit }
If this is just caused from the crush configuration reweight-by-utilization should be smarter about weighting down OSDs.
More cluster information is available in ZD Ticket #928
Actions