Project

General

Profile

Actions

Bug #11814

closed

implicit erasure code crush ruleset is not validated

Added by Herve Rousseau almost 9 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Context

  • RHEL6
  • Hammer 0.94.1
  • 3 Mons
  • 315 OSDs

Steps to reproduce

$ rm -fr out dev ; MON=1 OSD=3 ./vstart.sh -X -n -l mon osd
$ ceph osd erasure-code-profile set myprofile plugin=lrc mapping=__DD__DD layers='[[ "_cDD_cDD", "" ],[ "cDDD____", "" ],[ "____cDDD", "" ],]' ruleset-steps='[ [ "choose", "datacenter", 3 ], [ "chooseleaf", "osd", 0] ]'
$ ceph osd crush rule create-erasure myrule myprofile
created ruleset myrule at 1
$ ceph osd getcrushmap > /tmp/c
got crush map from osdmap epoch 13
$ ceph osd setcrushmap -i /tmp/c
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
Error EINVAL: Failed to parse crushmap: *** Caught signal (Segmentation fault) **

https://github.com/ceph/ceph/pull/4807


Files

crush.txt (16.9 KB) crush.txt Crush MAP Herve Rousseau, 05/29/2015 02:36 PM
ceph-mon.2.log (9.32 KB) ceph-mon.2.log Part of the log when mon is crashing Herve Rousseau, 05/29/2015 02:36 PM

Related issues 3 (0 open3 closed)

Related to Ceph - Feature #11815: mon: allow injecting new crushmapResolvedKefu Chai05/29/2015

Actions
Related to Ceph - Bug #12419: TEST_crush_rule_create_erasure consistently fails on i386 builderResolvedLoïc Dachary07/21/2015

Actions
Copied to Ceph - Backport #11824: implicit erasure code crush ruleset is not validatedResolvedLoïc Dachary05/29/2015Actions
Actions #1

Updated by Loïc Dachary almost 9 years ago

  • Subject changed from New EC pool crashed the mons to implicit erasure code crush ruleset is not validated
  • Status changed from New to 12
  • Assignee set to Loïc Dachary
  • Priority changed from Normal to High
  • Backport set to hammer

The crush ruleset created as a side effect of an erasure coded pool creation is not validated via crushtool, but it should be. In the same fashion a new crushmap being injected via ceph osd crush currently is.

Actions #2

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #3

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #4

Updated by Loïc Dachary almost 9 years ago

  • Status changed from 12 to Fix Under Review
Actions #5

Updated by Loïc Dachary almost 9 years ago

  • Priority changed from High to Urgent
Actions #6

Updated by Loïc Dachary almost 9 years ago

I believe this is fixed in hammer v0.94.2 with https://github.com/ceph/ceph/pull/4936 and various other patches that make it impossible to run into this specific situation. There were a few window of opportunities prior to v0.94.2.

The steps to reproduce the issue listed in the description do not actually work on v0.94.1:

loic@fold:~/software/ceph/ceph/src$ profile=k8m4isa
loic@fold:~/software/ceph/ceph/src$ ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host
loic@fold:~/software/ceph/ceph/src$ pool=castor-ec-isa
loic@fold:~/software/ceph/ceph/src$ ceph osd pool create $pool 4096 4096 erasure k8m4isa castor-ec-isa
Error ENOENT: specified ruleset castor-ec-isa doesn't exist
loic@fold:~/software/ceph/ceph/src$ ceph --version
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)

I suspect the situation was created with a different combo but it's difficult to figure it out.

Actions #7

Updated by Dan van der Ster almost 9 years ago

Hi Loic,
The reproducing steps were something like:

ceph osd crush add-bucket bigbang
ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host
ceph osd crush rm bigbang
ceph osd pool create castor-ec-isa 4096 4096 erasure k8m4isa

The mon should crash after that last pool create.

Cheers, Dan

Actions #8

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #9

Updated by Loïc Dachary almost 9 years ago

That can't happen (on master). But I updated with the description with another scenario that fails and I think the right fix is to verify the ruleset right before associating it with the pool.

loic@fold:~/software/ceph/ceph/src$ ceph osd crush add-bucket bigbang datacenter
added bucket bigbang type datacenter to crush map
loic@fold:~/software/ceph/ceph/src$ ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host
loic@fold:~/software/ceph/ceph/src$ ceph osd crush rm bigbang
removed item id -3 name 'bigbang' from crush map
loic@fold:~/software/ceph/ceph/src$ ceph osd pool create castor-ec-isa 4096 4096 erasure k8m4isa
Error ENOENT: root item bigbang does not exist
loic@fold:~/software/ceph/ceph/src$ ceph --version
ceph version 9.0.1-1494-g8fc0496 (8fc049664bc798432e1750da86b1f216f85a842d)

Actions #10

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #11

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #12

Updated by Kefu Chai almost 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #13

Updated by Loïc Dachary over 8 years ago

  • Status changed from Pending Backport to Resolved
Actions #14

Updated by Loïc Dachary over 8 years ago

Kefu added the script src/tools/ceph-monstore-update-crush.sh which is packaged with ceph-test to recover a monitor with a bugous crushmap.

Actions #15

Updated by Ken Dreyer over 8 years ago

Is src/tools/ceph-monstore-update-crush.sh something that only developers would run, or something that we ever expect users to run?

Actions #16

Updated by Kefu Chai about 8 years ago

ken, i expect that users to use this tool,

Actions

Also available in: Atom PDF