Feature #17714
reweight-by-utilization needs a subtree option
0%
Description
ceph osd reweight-by-utilization misbehaves on clusters where the crush rules generate imbalanced osds by design. Consider two examples:
1. A three OSD ceph cluster, with two 6TB OSDs and one 3TB OSD. Create one pool with 3x replication. When the 6TB OSDs are 25% full, the 3TB OSD will be 50% full, by design. However, in this case reweight-by-utilization will always decrease the weight of the 3TB OSD until it reaches 0.
2. A large Ceph cluster with two roots, e.g. default and objectstore. CRUSH rules are defined to assign Cinder-related pools to the default root, and radosgw-related pools to the objectstore root. Because the volumes and rgw usages are unequal, the relevant OSD utilizations are also unequal, by design. In this case, reweight-by-utilization will incorrectly decrease the weights on OSDs within whichever root has more user data.
To solve this I propose a new optional "bucket" or "subtree" option which could be passed to reweight-by-utilization. This would direct the command to only operate on osds which are in the subtree below the specified CRUSH bucket. I've prototyped this here: https://github.com/cernceph/ceph-scripts/commits/master/tools/crush-reweight-by-utilization.py
Note that reweight-by-pg already solves problem (2) above because it has a <pool> option. But this does not solve (1).
History
#1 Updated by Greg Farnum almost 7 years ago
- Tracker changed from Bug to Feature