https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-03-04T01:50:08ZCeph Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=871852017-03-04T01:50:08ZGerben Meijerinfernix@infernix.net
<ul></ul><p>So my take on this is that for some reason, either the new jewel OSDs or the old hammer ones started creating GMT timestamped hitset archives as rados objects, which the other OSDs did not find and/or could not replicate, which then triggered asserts somewhere.</p>
<p>All boxes were NTP time synced and on MDT timezone.</p> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=872152017-03-06T11:02:58ZKefu Chaitchaikov@gmail.com
<ul></ul><p>created <a class="external" href="https://github.com/ceph/ceph/pull/13805">https://github.com/ceph/ceph/pull/13805</a> to perform i/o with cache tier when upgrade from hammer to jewel</p> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=872602017-03-07T06:31:53ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Fix Under Review</i></li><li><strong>Assignee</strong> set to <i>Kefu Chai</i></li></ul><p>the use-gmt flag is unset for pools created before upgrading, so i don't think that's the case.</p>
<p><a class="external" href="https://github.com/ceph/ceph/pull/13821">https://github.com/ceph/ceph/pull/13821</a> is created to skip the archive hitset object when we are trying to remove it.</p> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=872672017-03-07T08:13:09ZGerben Meijerinfernix@infernix.net
<ul></ul><p>Kefu Chai wrote:</p>
<blockquote>
<p>the use-gmt flag is unset for pools created before upgrading, so i don't think that's the case.</p>
</blockquote>
<p>The very first OSD that crashed logged this:</p>
<pre>
Mar 02 20:46:30 ceph2.silo2.uspro1.unifiedlayer.com ceph-osd[85196]: 2017-03-02 20:46:30.973974 7f3f9651f700 -1 osd.254 pg_epoch: 262185 pg[19.643( v 262185'80274196 lc 262179'80273645 (262173'80270577,262185'80274196] local-les=262181 n=379 ec=83762 les/c/f 262181/262179/0 262180/262180/262180) [254,248,168] r=0 lpr=262180 pi=262170-262179/2 crt=262179'80273683 mlcod 262179'80273645 active+recovery_wait+degraded NIBBLEWISE m=1] agent_load_hit_sets: could not load hitset 19:c2600000:.ceph-internal::hit_set_19.643_archive_2017-03-03 03%3a44%3a39.027534Z_2017-03-03 03%3a44%3a39.653995Z:head
</pre>
<p>It crashed at 20:46 local time MDT on March 2nd, and it is looking for a hitset object that is 7 hours in the future. MDT+7 == GMT. At this point, 6 cache-tier OSDs were on Jewel, all others were still Hammer. We had not changed any flags anywhere, and we definitely weren't using use-gmt.</p>
<p>Here's the cache tier pool XML that I dumped at 21:03 with ceph report:</p>
<pre>
<code class="xml syntaxhl"><span class="CodeRay"><span class="tag"><pool></span>
<span class="tag"><pool></span>19<span class="tag"></pool></span>
<span class="tag"><pool_name></span>ssd-tier<span class="tag"></pool_name></span>
<span class="tag"><flags></span>9<span class="tag"></flags></span>
<span class="tag"><flags_names></span>hashpspool,incomplete_clones<span class="tag"></flags_names></span>
<span class="tag"><type></span>1<span class="tag"></type></span>
<span class="tag"><size></span>3<span class="tag"></size></span>
<span class="tag"><min_size></span>2<span class="tag"></min_size></span>
<span class="tag"><crush_ruleset></span>1<span class="tag"></crush_ruleset></span>
<span class="tag"><object_hash></span>2<span class="tag"></object_hash></span>
<span class="tag"><pg_num></span>2048<span class="tag"></pg_num></span>
<span class="tag"><pg_placement_num></span>2048<span class="tag"></pg_placement_num></span>
<span class="tag"><crash_replay_interval></span>0<span class="tag"></crash_replay_interval></span>
<span class="tag"><last_change></span>262131<span class="tag"></last_change></span>
<span class="tag"><last_force_op_resend></span>0<span class="tag"></last_force_op_resend></span>
<span class="tag"><auid></span>0<span class="tag"></auid></span>
<span class="tag"><snap_mode></span>selfmanaged<span class="tag"></snap_mode></span>
<span class="tag"><snap_seq></span>2<span class="tag"></snap_seq></span>
<span class="tag"><snap_epoch></span>262046<span class="tag"></snap_epoch></span>
<span class="tag"><pool_snaps</span> <span class="tag">/></span>
<span class="tag"><removed_snaps></span>[1~1]<span class="tag"></removed_snaps></span>
<span class="tag"><quota_max_bytes></span>0<span class="tag"></quota_max_bytes></span>
<span class="tag"><quota_max_objects></span>0<span class="tag"></quota_max_objects></span>
<span class="tag"><tiers</span> <span class="tag">/></span>
<span class="tag"><tier_of></span>17<span class="tag"></tier_of></span>
<span class="tag"><read_tier></span>-1<span class="tag"></read_tier></span>
<span class="tag"><write_tier></span>-1<span class="tag"></write_tier></span>
<span class="tag"><cache_mode></span>forward<span class="tag"></cache_mode></span>
<span class="tag"><target_max_bytes></span>10995116277760<span class="tag"></target_max_bytes></span>
<span class="tag"><target_max_objects></span>0<span class="tag"></target_max_objects></span>
<span class="tag"><cache_target_dirty_ratio_micro></span>400000<span class="tag"></cache_target_dirty_ratio_micro></span>
<span class="tag"><cache_target_dirty_high_ratio_micro></span>600000<span class="tag"></cache_target_dirty_high_ratio_micro></span>
<span class="tag"><cache_target_full_ratio_micro></span>400000<span class="tag"></cache_target_full_ratio_micro></span>
<span class="tag"><cache_min_flush_age></span>0<span class="tag"></cache_min_flush_age></span>
<span class="tag"><cache_min_evict_age></span>0<span class="tag"></cache_min_evict_age></span>
<span class="tag"><erasure_code_profile></span>replicated_ruleset_ssd<span class="tag"></erasure_code_profile></span>
<span class="tag"><hit_set_params></span>
<span class="tag"><type></span>bloom<span class="tag"></type></span>
<span class="tag"><false_positive_probability></span>0.05<span class="tag"></false_positive_probability></span>
<span class="tag"><target_size></span>0<span class="tag"></target_size></span>
<span class="tag"><seed></span>0<span class="tag"></seed></span>
<span class="tag"></hit_set_params></span>
<span class="tag"><hit_set_period></span>600<span class="tag"></hit_set_period></span>
<span class="tag"><hit_set_count></span>3<span class="tag"></hit_set_count></span>
<span class="tag"><use_gmt_hitset></span>false<span class="tag"></use_gmt_hitset></span>
<span class="tag"><min_read_recency_for_promote></span>1<span class="tag"></min_read_recency_for_promote></span>
<span class="tag"><min_write_recency_for_promote></span>1<span class="tag"></min_write_recency_for_promote></span>
<span class="tag"><hit_set_grade_decay_rate></span>0<span class="tag"></hit_set_grade_decay_rate></span>
<span class="tag"><hit_set_search_last_n></span>1<span class="tag"></hit_set_search_last_n></span>
<span class="tag"><grade_table></span>
<span class="tag"><value></span>1000000<span class="tag"></value></span>
<span class="tag"><value></span>1000000<span class="tag"></value></span>
<span class="tag"><value></span>1000000<span class="tag"></value></span>
<span class="tag"></grade_table></span>
<span class="tag"><stripe_width></span>0<span class="tag"></stripe_width></span>
<span class="tag"><expected_num_objects></span>0<span class="tag"></expected_num_objects></span>
<span class="tag"><fast_read></span>false<span class="tag"></fast_read></span>
<span class="tag"><options</span> <span class="tag">/></span>
<span class="tag"></pool></span>
</span></code><br /></pre>
<p>Unfortunately the log rotated for that OSD so I don't have an earlier stack trace. Either way: why is an OSD looking for a hitset 7 hours in the future? Doesn't seem right, or does it?</p> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=872682017-03-07T08:27:20ZGerben Meijerinfernix@infernix.net
<ul></ul><p>Just to emphasize again: use_gmt_hitset = false at the time of the first "could not load hitset with timestamp 7 hours in the future"</p> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=872702017-03-07T09:49:15ZKefu Chaitchaikov@gmail.com
<ul></ul><p>fix for hammer: <a class="external" href="https://github.com/ceph/ceph/pull/13826">https://github.com/ceph/ceph/pull/13826</a><br />fix for jewel: <a class="external" href="https://github.com/ceph/ceph/pull/13827">https://github.com/ceph/ceph/pull/13827</a></p> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=873282017-03-08T08:39:32ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>hammer, jewel</i></li></ul><p>Apparently, this is a special-case bug that is only present in hammer and jewel, not in master. I will stage backport tracker issues as usual (even though the master PR is "missing"), since we need to track the hammer and jewel fixes separately.</p> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=873292017-03-08T08:39:57ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-6 priority-4 priority-default closed" href="/issues/19222">Backport #19222</a>: hammer: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't exist</i> added</li></ul> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=873312017-03-08T08:40:00ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/19223">Backport #19223</a>: jewel: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't exist</i> added</li></ul> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=981252017-09-03T19:08:11ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul> Ceph - Bug #19185: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't existhttps://tracker.ceph.com/issues/19185?journal_id=1390852019-06-19T00:05:54ZBrad Hubbardbhubbard@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-9 priority-4 priority-default closed" href="/issues/40388">Bug #40388</a>: Mimic: osd crashes during hit_set_trim and hit_set_remove_all if hit set object doesn't exist</i> added</li></ul>