Project

General

Profile

Feature #55169

crush: should validate rule outputs osds

Added by Dan van der Ster 10 months ago. Updated 7 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
CRUSH
Pull request ID:

Description

In this thread https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/2ZUJN75RLL4YYD4EHAUS5I4IL37A7UUL/ a user suffered a multi day outage, with down PGs and OSDs crashing due to "start interval does not contain the required bound".

After a long story, the root cause was found to be that the user had injected a crush rule that had "choose" instead of "chooseleaf".

rule csd-data-pool {
         id 5
         type erasure
         min_size 3
         max_size 5
         step set_chooseleaf_tries 5
         step set_choose_tries 100
         step take default class big
         step choose indep 0 type host    <--- HERE!
         step emit
}

Can we add better validation to prevent such mistakes?

History

#1 Updated by Radoslaw Zarzynski 10 months ago

  • Tracker changed from Bug to Feature
  • Tags set to low-hanging-fruit

Adding the extra check makes sense, I think. Implementing the patch would be a low-hanging-fruit but reviewing will not.

#2 Updated by Laura Flores 7 months ago

  • Tags set to low-hanging-fruit
  • Tags deleted (low-hanging-fruit)

Also available in: Atom PDF