https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2015-05-29T14:41:50ZCeph Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=526972015-05-29T14:41:50ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Subject</strong> changed from <i>New EC pool crashed the mons</i> to <i>implicit erasure code crush ruleset is not validated</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li><li><strong>Assignee</strong> set to <i>Loïc Dachary</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li><li><strong>Backport</strong> set to <i>hammer</i></li></ul><p>The crush ruleset created as a side effect of an erasure coded pool creation is not validated via crushtool, but it should be. In the same fashion a new crushmap being injected via <strong>ceph osd crush</strong> currently is.</p> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=526982015-05-29T14:42:55ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/52698/diff?detail_id=51366">diff</a>)</li></ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=527012015-05-29T16:33:03ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/52701/diff?detail_id=51368">diff</a>)</li></ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=528082015-05-29T23:17:03ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Fix Under Review</i></li></ul><ul>
<li>master <a class="external" href="https://github.com/ceph/ceph/pull/4807">https://github.com/ceph/ceph/pull/4807</a></li>
</ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=528092015-05-29T23:19:16ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Priority</strong> changed from <i>High</i> to <i>Urgent</i></li></ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=550672015-07-16T14:32:05ZLoïc Dacharyloic@dachary.org
<ul></ul><p>I believe this is fixed in hammer v0.94.2 with <a class="external" href="https://github.com/ceph/ceph/pull/4936">https://github.com/ceph/ceph/pull/4936</a> and various other patches that make it impossible to run into this specific situation. There were a few window of opportunities prior to v0.94.2.</p>
<p>The steps to reproduce the issue listed in the description do not actually work on v0.94.1:<br /><pre>
loic@fold:~/software/ceph/ceph/src$ profile=k8m4isa
loic@fold:~/software/ceph/ceph/src$ ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host
loic@fold:~/software/ceph/ceph/src$ pool=castor-ec-isa
loic@fold:~/software/ceph/ceph/src$ ceph osd pool create $pool 4096 4096 erasure k8m4isa castor-ec-isa
Error ENOENT: specified ruleset castor-ec-isa doesn't exist
loic@fold:~/software/ceph/ceph/src$ ceph --version
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
</pre></p>
<p>I suspect the situation was created with a different combo but it's difficult to figure it out.</p> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=550792015-07-16T15:31:48ZDan van der Ster
<ul></ul><p>Hi Loic,<br />The reproducing steps were something like:</p>
<pre>
ceph osd crush add-bucket bigbang
ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host
ceph osd crush rm bigbang
ceph osd pool create castor-ec-isa 4096 4096 erasure k8m4isa
</pre>
<p>The mon should crash after that last pool create.</p>
<p>Cheers, Dan</p> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=550842015-07-16T15:44:10ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/55084/diff?detail_id=53534">diff</a>)</li></ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=550852015-07-16T15:48:48ZLoïc Dacharyloic@dachary.org
<ul></ul><p>That can't happen (on master). But I updated with the description with another scenario that fails and I think the right fix is to verify the ruleset right before associating it with the pool.<br /><pre>
loic@fold:~/software/ceph/ceph/src$ ceph osd crush add-bucket bigbang datacenter
added bucket bigbang type datacenter to crush map
loic@fold:~/software/ceph/ceph/src$ ceph osd erasure-code-profile set k8m4isa plugin=isa k=8 m=4 technique=reed_sol_van ruleset-root=bigbang ruleset-failure-domain=host
loic@fold:~/software/ceph/ceph/src$ ceph osd crush rm bigbang
removed item id -3 name 'bigbang' from crush map
loic@fold:~/software/ceph/ceph/src$ ceph osd pool create castor-ec-isa 4096 4096 erasure k8m4isa
Error ENOENT: root item bigbang does not exist
loic@fold:~/software/ceph/ceph/src$ ceph --version
ceph version 9.0.1-1494-g8fc0496 (8fc049664bc798432e1750da86b1f216f85a842d)
</pre></p> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=550892015-07-16T16:22:08ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/55089/diff?detail_id=53539">diff</a>)</li></ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=550902015-07-16T16:23:02ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/55090/diff?detail_id=53540">diff</a>)</li></ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=551292015-07-17T10:27:08ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li></ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=579122015-09-06T21:18:23ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=606562015-10-26T12:14:55ZLoïc Dacharyloic@dachary.org
<ul></ul><p>Kefu added the script src/tools/ceph-monstore-update-crush.sh which is packaged with ceph-test to recover a monitor with a bugous crushmap.</p> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=608052015-10-27T23:22:51ZKen Dreyerkdreyer@redhat.com
<ul></ul><p>Is <code>src/tools/ceph-monstore-update-crush.sh</code> something that only developers would run, or something that we ever expect users to run?</p> Ceph - Bug #11814: implicit erasure code crush ruleset is not validatedhttps://tracker.ceph.com/issues/11814?journal_id=676692016-03-17T04:25:10ZKefu Chaitchaikov@gmail.com
<ul></ul><p>ken, i expect that users to use this tool,</p>