Project

General

Profile

Actions

Support #8600

closed

MON crashes on new crushmap injection

Added by Jean-Charles Lopez almost 10 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Tags:
monitor crush segfault
Reviewed:
Affected Versions:
Component(RADOS):
Monitor
Pull request ID:

Description

The crush map contains the following rule
rule ssd {
ruleset 1
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type rack
step chooseleaf firstn 0 type host
step emit
}

crushtool compiles the map with no warning nor error.

When the new map is injected into the cluster, it causes the MON to segfault.

Restarting the faulted MON brings cluster back to norma operation mode.

Issue can be reproduced at will


Files

ceph-mon.log (89.3 KB) ceph-mon.log ceph-mon log while injecting the map Jean-Charles Lopez, 06/14/2014 01:44 PM
cmbad.txt (2.67 KB) cmbad.txt map that can compile and containing above directives Jean-Charles Lopez, 06/14/2014 01:44 PM

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #9485: Monitor crash due to wrong crush rule setResolvedLoïc Dachary09/15/2014

Actions
Actions #1

Updated by Sage Weil almost 10 years ago

  • Assignee set to Joao Eduardo Luis
  • Priority changed from Normal to High
Actions #2

Updated by Joao Eduardo Luis almost 10 years ago

JC, although we don't have a fix for the crash yet (we shouldn't crash if a crushmap is incorrectly structured), there's an easy way to avoid the crash.

Basically there's two things to note:

1. those chooseleaf's on rule 'ssd' and rule 'hdd' aren't doing what you thing they're doing, as they'll first grab leaves from 'rack' and then they'll grab leaves from 'host'.

2. what you probably want is a 'choose ... rack' and then 'chooseleaf ... host'.

Removing the 'chooseleaf ... rack' before the host, or the 'chooseleaf ... host' after the rack will avoid the crash. Changing 'chooseleaf ... rack' to 'choose ... rack' will also avoid the crash.

Actions #3

Updated by Henning Stener over 9 years ago

In addition to the choose vs. chooseleaf issue that Joao is mentioning here, we have also seen problems when min_size is lower than what a rule actually requires.

rule crashtest {
...
min_size=1
step chooseleaf firstn 2 type rack
step emit
}

This at least causes crushtool --test to segfault, so not 100% sure if the MON bails on this too.

Actions #4

Updated by Patrick Donnelly almost 5 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)
  • Component(RADOS) Monitor added
Actions #5

Updated by Joao Eduardo Luis over 3 years ago

  • Category set to Correctness/Safety
  • Status changed from New to Closed
  • Assignee deleted (Joao Eduardo Luis)

closing because no one has complained for 6 years.

Actions

Also available in: Atom PDF