Bug #11814
closedimplicit erasure code crush ruleset is not validated
0%
Description
Context¶
- RHEL6
- Hammer 0.94.1
- 3 Mons
- 315 OSDs
Steps to reproduce¶
$ rm -fr out dev ; MON=1 OSD=3 ./vstart.sh -X -n -l mon osd $ ceph osd erasure-code-profile set myprofile plugin=lrc mapping=__DD__DD layers='[[ "_cDD_cDD", "" ],[ "cDDD____", "" ],[ "____cDDD", "" ],]' ruleset-steps='[ [ "choose", "datacenter", 3 ], [ "chooseleaf", "osd", 0] ]' $ ceph osd crush rule create-erasure myrule myprofile created ruleset myrule at 1 $ ceph osd getcrushmap > /tmp/c got crush map from osdmap epoch 13 $ ceph osd setcrushmap -i /tmp/c *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** Error EINVAL: Failed to parse crushmap: *** Caught signal (Segmentation fault) **
Files
Updated by Loïc Dachary almost 9 years ago
- Subject changed from New EC pool crashed the mons to implicit erasure code crush ruleset is not validated
- Status changed from New to 12
- Assignee set to Loïc Dachary
- Priority changed from Normal to High
- Backport set to hammer
The crush ruleset created as a side effect of an erasure coded pool creation is not validated via crushtool, but it should be. In the same fashion a new crushmap being injected via ceph osd crush currently is.
Updated by Loïc Dachary almost 9 years ago
- Status changed from 12 to Fix Under Review
Updated by Loïc Dachary almost 9 years ago
I believe this is fixed in hammer v0.94.2 with https://github.com/ceph/ceph/pull/4936 and various other patches that make it impossible to run into this specific situation. There were a few window of opportunities prior to v0.94.2.
The steps to reproduce the issue listed in the description do not actually work on v0.94.1:
loic@fold:~/software/ceph/ceph/src$ profile=k8m4isa loic@fold:~/software/ceph/ceph/src$ ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host loic@fold:~/software/ceph/ceph/src$ pool=castor-ec-isa loic@fold:~/software/ceph/ceph/src$ ceph osd pool create $pool 4096 4096 erasure k8m4isa castor-ec-isa Error ENOENT: specified ruleset castor-ec-isa doesn't exist loic@fold:~/software/ceph/ceph/src$ ceph --version ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
I suspect the situation was created with a different combo but it's difficult to figure it out.
Updated by Dan van der Ster almost 9 years ago
Hi Loic,
The reproducing steps were something like:
ceph osd crush add-bucket bigbang ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host ceph osd crush rm bigbang ceph osd pool create castor-ec-isa 4096 4096 erasure k8m4isa
The mon should crash after that last pool create.
Cheers, Dan
Updated by Loïc Dachary almost 9 years ago
That can't happen (on master). But I updated with the description with another scenario that fails and I think the right fix is to verify the ruleset right before associating it with the pool.
loic@fold:~/software/ceph/ceph/src$ ceph osd crush add-bucket bigbang datacenter added bucket bigbang type datacenter to crush map loic@fold:~/software/ceph/ceph/src$ ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host loic@fold:~/software/ceph/ceph/src$ ceph osd crush rm bigbang removed item id -3 name 'bigbang' from crush map loic@fold:~/software/ceph/ceph/src$ ceph osd pool create castor-ec-isa 4096 4096 erasure k8m4isa Error ENOENT: root item bigbang does not exist loic@fold:~/software/ceph/ceph/src$ ceph --version ceph version 9.0.1-1494-g8fc0496 (8fc049664bc798432e1750da86b1f216f85a842d)
Updated by Kefu Chai almost 9 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Loïc Dachary over 8 years ago
- Status changed from Pending Backport to Resolved
Updated by Loïc Dachary over 8 years ago
Kefu added the script src/tools/ceph-monstore-update-crush.sh which is packaged with ceph-test to recover a monitor with a bugous crushmap.
Updated by Ken Dreyer over 8 years ago
Is src/tools/ceph-monstore-update-crush.sh
something that only developers would run, or something that we ever expect users to run?
Updated by Kefu Chai about 8 years ago
ken, i expect that users to use this tool,