Project

General

Profile

Support #8600

MON crashes on new crushmap injection

Added by Jean-Charles Lopez over 5 years ago. Updated 2 months ago.

Status:
New
Priority:
High
Category:
-
Target version:
-
Start date:
06/14/2014
Due date:
% Done:

0%

Tags:
monitor crush segfault
Reviewed:
Affected Versions:
Component(RADOS):
Monitor
Pull request ID:

Description

The crush map contains the following rule
rule ssd {
ruleset 1
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type rack
step chooseleaf firstn 0 type host
step emit
}

crushtool compiles the map with no warning nor error.

When the new map is injected into the cluster, it causes the MON to segfault.

Restarting the faulted MON brings cluster back to norma operation mode.

Issue can be reproduced at will

ceph-mon.log View - ceph-mon log while injecting the map (89.3 KB) Jean-Charles Lopez, 06/14/2014 01:44 PM

cmbad.txt View - map that can compile and containing above directives (2.67 KB) Jean-Charles Lopez, 06/14/2014 01:44 PM


Related issues

Duplicated by Ceph - Bug #9485: Monitor crash due to wrong crush rule set Resolved 09/15/2014

History

#1 Updated by Sage Weil about 5 years ago

  • Assignee set to Joao Eduardo Luis
  • Priority changed from Normal to High

#2 Updated by Joao Eduardo Luis about 5 years ago

JC, although we don't have a fix for the crash yet (we shouldn't crash if a crushmap is incorrectly structured), there's an easy way to avoid the crash.

Basically there's two things to note:

1. those chooseleaf's on rule 'ssd' and rule 'hdd' aren't doing what you thing they're doing, as they'll first grab leaves from 'rack' and then they'll grab leaves from 'host'.

2. what you probably want is a 'choose ... rack' and then 'chooseleaf ... host'.

Removing the 'chooseleaf ... rack' before the host, or the 'chooseleaf ... host' after the rack will avoid the crash. Changing 'chooseleaf ... rack' to 'choose ... rack' will also avoid the crash.

#3 Updated by Henning Stener about 5 years ago

In addition to the choose vs. chooseleaf issue that Joao is mentioning here, we have also seen problems when min_size is lower than what a rule actually requires.

rule crashtest {
...
min_size=1
step chooseleaf firstn 2 type rack
step emit
}

This at least causes crushtool --test to segfault, so not 100% sure if the MON bails on this too.

#4 Updated by Patrick Donnelly 2 months ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)
  • Component(RADOS) Monitor added

Also available in: Atom PDF