Feature #55169
crush: should validate rule outputs osds
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
CRUSH
Pull request ID:
Tags:
Description
In this thread https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/2ZUJN75RLL4YYD4EHAUS5I4IL37A7UUL/ a user suffered a multi day outage, with down PGs and OSDs crashing due to "start interval does not contain the required bound".
After a long story, the root cause was found to be that the user had injected a crush rule that had "choose" instead of "chooseleaf".
rule csd-data-pool { id 5 type erasure min_size 3 max_size 5 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class big step choose indep 0 type host <--- HERE! step emit }
Can we add better validation to prevent such mistakes?
History
#1 Updated by Radoslaw Zarzynski 10 months ago
- Tracker changed from Bug to Feature
- Tags set to low-hanging-fruit
Adding the extra check makes sense, I think. Implementing the patch would be a low-hanging-fruit but reviewing will not.
#2 Updated by Laura Flores 7 months ago
- Tags set to low-hanging-fruit
- Tags deleted (
low-hanging-fruit)