https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-07-12T15:43:52ZCeph RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=949992017-07-12T15:43:52ZJosh Durgin
<ul></ul><p>Since this flag is set all the time now, it (and the require_x_osds flags) aren't shown by default. Does it appear in 'ceph osd dump --format json-pretty | grep flags' ?</p> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=950002017-07-12T15:45:42ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>RADOS</i></li><li><strong>Category</strong> deleted (<del><i>OSD</i></del>)</li></ul> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=963662017-08-02T15:31:12ZJosh Durgin
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Need More Info</i></li></ul> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=969772017-08-15T15:40:56ZHey Paspas@zomg.hu
<ul></ul><p>Hello,</p>
<p>sorry for the delay</p>
<p>Yes, it appears under flags.</p>
<pre><code class="json syntaxhl"><span class="CodeRay">{
<span class="key"><span class="delimiter">"</span><span class="content">epoch</span><span class="delimiter">"</span></span>: <span class="integer">542</span>,
<span class="key"><span class="delimiter">"</span><span class="content">fsid</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">created</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">2015-09-01 18:27:53.253076</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">modified</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">2017-07-17 19:10:14.234962</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">flags</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">sortbitwise,require_jewel_osds</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">crush_version</span><span class="delimiter">"</span></span>: <span class="integer">1</span>,
<span class="key"><span class="delimiter">"</span><span class="content">full_ratio</span><span class="delimiter">"</span></span>: <span class="float">0.000000</span>,
<span class="key"><span class="delimiter">"</span><span class="content">backfillfull_ratio</span><span class="delimiter">"</span></span>: <span class="float">0.000000</span>,
<span class="key"><span class="delimiter">"</span><span class="content">nearfull_ratio</span><span class="delimiter">"</span></span>: <span class="float">0.000000</span>,
<span class="key"><span class="delimiter">"</span><span class="content">cluster_snapshot</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">pool_max</span><span class="delimiter">"</span></span>: <span class="integer">18</span>,
<span class="key"><span class="delimiter">"</span><span class="content">max_osd</span><span class="delimiter">"</span></span>: <span class="integer">4</span>,
<span class="key"><span class="delimiter">"</span><span class="content">require_min_compat_client</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">unknown</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">min_compat_client</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">jewel</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">require_osd_release</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">jewel</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">pools</span><span class="delimiter">"</span></span>: [ <span class="key"><span class="delimiter">"</span><span class="content">omitted</span><span class="delimiter">"</span></span> : <span class="string"><span class="delimiter">"</span><span class="content">omitted</span><span class="delimiter">"</span></span> ]
}
</span></code></pre> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=996092017-09-21T20:08:53ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Subject</strong> changed from <i>Cannot set sortbitwise flag</i> to <i>"FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded cluster</i></li><li><strong>Status</strong> changed from <i>Need More Info</i> to <i>12</i></li><li><strong>Assignee</strong> set to <i>Greg Farnum</i></li></ul><p>Got a report of this happening in downstream Red Hat packages at <a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1494238">https://bugzilla.redhat.com/show_bug.cgi?id=1494238</a></p>
<p>I went through the code a bit and there is a bit of an issue:<br />1) run without sortbitwise set<br />2) set sortbitwise<br />3) upgrade to Luminous before all OSDs have processed the OSDMap which sets sortbitwise<br />4) assert horribly because the PG gets set up with pre-sortbitwise map but still has the assert</p>
<p>But the bugzilla report there has apparently been running for months with sortbitwise so it doesn't seem likely to be the case on its own. I'm wondering if maybe there are "dead" PGs that haven't been updated in a while or something.</p> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=998532017-09-26T00:10:41ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>Okay, the one I'm looking at is crashing on pg 126.b7, at epoch 5350. Pool 126 does not presently exist; epoch 5350 (modified 2017-09-19 15:48:16.743313) really does not have sortbitwise set (nor does 5447 (modified 2017-09-19 15:55:28.991958) which is the newest map the OSD has on disk); the cluster is currently at 6019 (modified 2017-09-23 22:28:16.690407) and that map does have sortbitwise set.</p>
<p>Looks like sortbitwise was set in epoch 5784 (modified 2017-09-19 16:02:56.161941); I didn't bother to track down when the pool was deleted. (It was much later.)</p>
<p>Still pondering how to let this situation resolve itself in code...</p> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=998542017-09-26T00:24:25ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>Okay. Assuming sortbitwise is just a messaging scheme (I think it is), we should be safe to change the assert to require sortbitwise or that we (the OSD) are down during this map.</p>
<p>I also kind of want to remove that assert from the per-PG per-map processing anyway; will look and see if there's a better place to put it.</p> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=1000602017-09-29T22:36:50ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>4</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/18047">https://github.com/ceph/ceph/pull/18047</a> for the fix. I'll backport it to Luminous if that looks good.</p> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=1003122017-10-05T18:01:56ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>4</i> to <i>7</i></li><li><strong>Backport</strong> set to <i>luminous</i></li></ul><p>Yuri's testing it (it will pass), so I went ahead and created a backport PR: <a class="external" href="https://github.com/ceph/ceph/pull/18132">https://github.com/ceph/ceph/pull/18132</a></p> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=1003392017-10-06T02:01:16ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>7</i> to <i>Resolved</i></li></ul> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=1004082017-10-06T19:49:51ZYuri Weinsteinyweinste@redhat.com
<ul></ul><p>Greg Farnum wrote:</p>
<blockquote>
<p><a class="external" href="https://github.com/ceph/ceph/pull/18047">https://github.com/ceph/ceph/pull/18047</a> for the fix. I'll backport it to Luminous if that looks good.</p>
</blockquote>
<p>merged</p> RADOS - Bug #20416: "FAILED assert(osdmap->test_flag((1<<15)))" (sortbitwise) on upgraded clusterhttps://tracker.ceph.com/issues/20416?journal_id=1004122017-10-06T20:18:08ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Backport</strong> deleted (<del><i>luminous</i></del>)</li></ul><p>fast-tracking the backport, since it's already open</p>