Project

General

Profile

Actions

Feature #55169

open

crush: should validate rule outputs osds

Added by Dan van der Ster about 2 years ago. Updated about 1 year ago.

Status:
In Progress
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
CRUSH
Pull request ID:

Description

In this thread https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/2ZUJN75RLL4YYD4EHAUS5I4IL37A7UUL/ a user suffered a multi day outage, with down PGs and OSDs crashing due to "start interval does not contain the required bound".

After a long story, the root cause was found to be that the user had injected a crush rule that had "choose" instead of "chooseleaf".

rule csd-data-pool {
         id 5
         type erasure
         min_size 3
         max_size 5
         step set_chooseleaf_tries 5
         step set_choose_tries 100
         step take default class big
         step choose indep 0 type host    <--- HERE!
         step emit
}

Can we add better validation to prevent such mistakes?

Actions

Also available in: Atom PDF