Support #8600

MON crashes on new crushmap injection

Added by Jean-Charles Lopez over 8 years ago. Updated over 2 years ago.

Target version:
% Done:


monitor crush segfault
Affected Versions:
Pull request ID:


The crush map contains the following rule
rule ssd {
ruleset 1
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type rack
step chooseleaf firstn 0 type host
step emit

crushtool compiles the map with no warning nor error.

When the new map is injected into the cluster, it causes the MON to segfault.

Restarting the faulted MON brings cluster back to norma operation mode.

Issue can be reproduced at will

ceph-mon.log View - ceph-mon log while injecting the map (89.3 KB) Jean-Charles Lopez, 06/14/2014 01:44 PM

cmbad.txt View - map that can compile and containing above directives (2.67 KB) Jean-Charles Lopez, 06/14/2014 01:44 PM

Related issues

Duplicated by Ceph - Bug #9485: Monitor crash due to wrong crush rule set Resolved 09/15/2014


#1 Updated by Sage Weil over 8 years ago

  • Assignee set to Joao Eduardo Luis
  • Priority changed from Normal to High

#2 Updated by Joao Eduardo Luis over 8 years ago

JC, although we don't have a fix for the crash yet (we shouldn't crash if a crushmap is incorrectly structured), there's an easy way to avoid the crash.

Basically there's two things to note:

1. those chooseleaf's on rule 'ssd' and rule 'hdd' aren't doing what you thing they're doing, as they'll first grab leaves from 'rack' and then they'll grab leaves from 'host'.

2. what you probably want is a 'choose ... rack' and then 'chooseleaf ... host'.

Removing the 'chooseleaf ... rack' before the host, or the 'chooseleaf ... host' after the rack will avoid the crash. Changing 'chooseleaf ... rack' to 'choose ... rack' will also avoid the crash.

#3 Updated by Henning Stener over 8 years ago

In addition to the choose vs. chooseleaf issue that Joao is mentioning here, we have also seen problems when min_size is lower than what a rule actually requires.

rule crashtest {
step chooseleaf firstn 2 type rack
step emit

This at least causes crushtool --test to segfault, so not 100% sure if the MON bails on this too.

#4 Updated by Patrick Donnelly over 3 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)
  • Component(RADOS) Monitor added

#5 Updated by Joao Eduardo Luis over 2 years ago

  • Category set to Correctness/Safety
  • Status changed from New to Closed
  • Assignee deleted (Joao Eduardo Luis)

closing because no one has complained for 6 years.

Also available in: Atom PDF