https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-02-19T19:18:04ZCeph RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1590952020-02-19T19:18:04ZWido den Hollanderwido@42on.com
<ul></ul><p>On the Ceph users list there are multiple reports of people experiencing this:</p>
<p>- <a class="external" href="https://www.spinics.net/lists/ceph-users/msg54910.html">https://www.spinics.net/lists/ceph-users/msg54910.html</a><br />- <a class="external" href="https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QSFGP5H7KJAW7Y3MZR7B2UWCHUQCHSLK/?sort=thread">https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QSFGP5H7KJAW7Y3MZR7B2UWCHUQCHSLK/?sort=thread</a></p>
<p>This seems to be Nautilus related.</p>
<p>I'll see if I can reproduce it somehow, but that doesn't seem to be very simple. None of my test systems experience this.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1590992020-02-19T22:08:09ZNeha Ojhanojha@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Need More Info</i></li></ul><p>Hi Wido,</p>
<p>I did come across something like this while investigating <a class="external" href="https://tracker.ceph.com/issues/43048">https://tracker.ceph.com/issues/43048</a>. It was also on nautilus but the slow op osd_pg_create messages eventually went away once the pgs were able to fetch the required osdmaps for the corresponding epochs. In your case, do you observe the same or in other words, how long do you see these slow ops appear? If you happen to have osd logs, please attach them.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1591502020-02-20T12:20:41ZWido den Hollanderwido@42on.com
<ul></ul><p>Neha Ojha wrote:</p>
<blockquote>
<p>Hi Wido,</p>
<p>I did come across something like this while investigating <a class="external" href="https://tracker.ceph.com/issues/43048">https://tracker.ceph.com/issues/43048</a>. It was also on nautilus but the slow op osd_pg_create messages eventually went away once the pgs were able to fetch the required osdmaps for the corresponding epochs. In your case, do you observe the same or in other words, how long do you see these slow ops appear? If you happen to have osd logs, please attach them.</p>
</blockquote>
<p>The logs we have a very minimal. But the slow ops never went away. What I have from logs I've posted in the beginning of this issue.</p>
<p>All PGs are active+clean.</p>
<p>This started around 07:00 in the morning (when the balancer became active) and the slow ops remained until ~11:30 until we removed the newly created pool.</p>
<p>I <strong>think</strong> I can reproduce this easily on this production cluster, but with a few PB online I simply can't test this again and see what happens.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1596672020-02-26T22:11:47ZNeha Ojhanojha@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1601622020-03-03T14:48:09ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>We've got similar case with a plenty of slow op indications many of them are osd_op_create ones. <br />Which eventually goes away and reappear after a while.</p>
<p>Nautilus v14.2.5</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1607392020-03-10T15:15:54ZJan Fajerskilists@fajerski.name
<ul></ul><p>Hi everyone, we were able to collect some debug logs that seems to exhibit this case. the log is fairly large, you can find a compressed version of it under <a class="external" href="https://framadrop.org/r/zMEeWuFJF7#X6kbAAfodjjvIR9K5h3AdeZfaS2Jf9AuHeiJtiuU7wM=">https://framadrop.org/r/zMEeWuFJF7#X6kbAAfodjjvIR9K5h3AdeZfaS2Jf9AuHeiJtiuU7wM=</a> (will self delete Thursday, April 9, 2020 5:13 PM)</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1608192020-03-11T14:29:18ZPaul Emmerichpaul.emmerich@oocero.de
<ul></ul><p>I've encountered another cluster in the wild that hit this bug. It seems to be triggered somewhat reliable a few hours <strong>after</strong> creating new pools. Deleting the new pools resolves it. Keeping the pools causes random problems when anything in the osdmap changes.</p>
<p>The new pool works without any problems until the problem occurs, it only happens after a few hours.</p>
<p>It might be related to scrubbing oder snapshot operations; but this may be a co-incidence. It triggered at around the same time as scrubbing started and a snapshot cron job. The balancer is disabled on this cluster.</p>
<p>And yeah, unfortunately I can't debug it on this system either because it's a somewhat important production cluster :(</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1608222020-03-11T14:31:58ZPaul Emmerichpaul.emmerich@oocero.de
<ul></ul><p>Oh, running mostly 14.2.4 and some OSDs on 14.2.5 on Ubuntu 16.04. Small cluster with only 127 OSDs.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1608232020-03-11T14:35:45ZPaul Emmerichpaul.emmerich@oocero.de
<ul></ul><p>Forgot to mention: the cluster has been running since Jewel 10.2.4. I think upgrading from Jewel or older seems to be a pre-requisite for this to happen?</p>
<p>I've seen this on two setups and one has been upgraded from Jewel, one from Firefly. Both have been fully migrated to BlueStore, so no mixed setup like in the original description.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1608882020-03-12T14:16:02ZWido den Hollanderwido@42on.com
<ul></ul><p>So this message came along on the users mailinglist:</p>
<pre>
so I can confirm that at least in my case, the problem is caused
by old osd maps not being pruned for some reason, and thus not fitting
into cache. When I increased osd map cache to 5000 the problem is gone.
The question is why they're not being pruned, even though the cluster is in
healthy state. But you can try checking:
ceph daemon osd.X status to see how many maps are your OSDs storing
and ceph daemon osd.X perf dump | grep osd_map_cache_miss
to see if you're experiencing similar problem..
so I'm going to debug further..
</pre> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1609652020-03-13T12:53:13ZJan Fajerskilists@fajerski.name
<ul></ul><p>Can confirm seeing issues with osd map pruning ("oldest_map": 41985 vs "newest_map": 83376) and large osd_map_cache_miss counter values.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1609662020-03-13T12:57:58ZDan van der Ster
<ul></ul><p>Jan Fajerski wrote:</p>
<blockquote>
<p>Can confirm seeing issues with osd map pruning ("oldest_map": 41985 vs "newest_map": 83376) and large osd_map_cache_miss counter values.</p>
</blockquote>
<p>See <a class="external" href="https://tracker.ceph.com/issues/37875">https://tracker.ceph.com/issues/37875</a></p>
<p>With `ceph pg dump -f json | jq .osd_epochs` you can see which OSD is needing osdmap 41985. What's the status of that osd?</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1609762020-03-13T16:11:17ZAndrew Mitroshin
<ul></ul><p>Could you please submit output for the command</p>
<pre>
ceph osd dump | grep require
</pre> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1610372020-03-16T14:52:43ZWout van Heeswijkwout@42on.com
<ul></ul><p>Another customer of ours has reported this behaviour. This cluster was, most likely, installed with Hammer and is now running Nautilus (14.2.4/14.2.5).</p>
<p>created 2016-05-25<br />require_min_compat_client firefly<br />min_compat_client firefly</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1612232020-03-18T08:09:48ZJan Fajerskilists@fajerski.name
<ul></ul><p>Dan van der Ster wrote:</p>
<blockquote>
<p>See <a class="external" href="https://tracker.ceph.com/issues/37875">https://tracker.ceph.com/issues/37875</a></p>
<p>With `ceph pg dump -f json | jq .osd_epochs` you can see which OSD is needing osdmap 41985. What's the status of that osd?</p>
</blockquote>
<p>This is no longer part of the pf dump in nautilus unfortunately.</p>
<p>We also tried the workaround mentioned in <a class="external" href="https://tracker.ceph.com/issues/37875">https://tracker.ceph.com/issues/37875</a>, i.e. removing all down or out OSDs from the crushmap and then restarting the MON daemons. This did not result in OSD maps being pruned.</p>
<p>Still seeing<br /><pre><code class="text syntaxhl"><span class="CodeRay"># ceph report | jq '.osdmap_manifest.pinned_maps | length'
4162
</span></code></pre></p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1612242020-03-18T08:15:09ZJan Fajerskilists@fajerski.name
<ul></ul><p>Andrew Mitroshin wrote:</p>
<blockquote>
<p>Could you please submit output for the command</p>
<p>[...]</p>
</blockquote>
<p>% ceph osd dump | grep require :(<br />require_min_compat_client jewel<br />require_osd_release luminous</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1613802020-03-20T08:50:42ZAndrew Mitroshin
<ul></ul><p>Jan Fajerski wrote:</p>
<blockquote>
<p>Andrew Mitroshin wrote:</p>
<blockquote>
<p>Could you please submit output for the command</p>
<p>[...]</p>
</blockquote>
<p>% ceph osd dump | grep require :(<br />require_min_compat_client jewel<br />require_osd_release luminous</p>
</blockquote>
<p>When upgrading to nautilus you should issue command:<br /><pre>
ceph osd require-osd-release nautilus
</pre></p>
<p>As mentioned in <a class="external" href="https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous">https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous</a> <br />p. 10</p>
<p>This will eliminate slow_ops and osd_map_cache_misses</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1614972020-03-23T15:39:15ZNikola Ciprichnikola.ciprich@linuxbox.cz
<ul><li><strong>File</strong> <a href="/attachments/download/4763/ceph-13.2.8-debug-creating-pgs.diff">ceph-13.2.8-debug-creating-pgs.diff</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/4763/ceph-13.2.8-debug-creating-pgs.diff">View</a> added</li></ul><p>Hi, I'm dealing with similar (not sure whether the same) problem on 13.2.8. I've narrowed it down to osdmaps not being pruned, even though the cluster is in healthy state and there are no down OSDs. Digging further, I got to OSDMonitor::get_trim_to function, namely this part:</p>
<pre><code>std::lock_guard&lt;std::mutex&gt; l(creating_pgs_lock);<br /> if (!creating_pgs.pgs.empty()) {<br /> return 0;<br /> }</code></pre>
<p>indeed, the problem in this cluster is, that there is bunch of pgs in the monitors DB which monitors believe are in creating state, even though they're in fact in clean+active state. I'm attaching 13.2.8 patch to get dump of those, as I wasn't able to get the list using ceph-kvstore-tool. So if anybody is interested, you can give it a try. Now the question is, why this happened and how can I get rid of those..</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1618312020-03-27T03:37:53Zhoan nv
<ul></ul><p>Andrew Mitroshin wrote:</p>
<blockquote>
<p>Jan Fajerski wrote:</p>
<blockquote>
<p>Andrew Mitroshin wrote:</p>
<blockquote>
<p>Could you please submit output for the command</p>
<p>[...]</p>
</blockquote>
<p>% ceph osd dump | grep require :(<br />require_min_compat_client jewel<br />require_osd_release luminous</p>
</blockquote>
<p>When upgrading to nautilus you should issue command:<br />[...]</p>
<p>As mentioned in <a class="external" href="https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous">https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous</a> <br />p. 10</p>
<p>This will eliminate slow_ops and osd_map_cache_misses</p>
</blockquote>
<p>Are you sure?</p>
<p>My cluster has 2 pools. Pool1 has all of osd in nautilus version, pool2 has all of osd in mimic version.<br />Now i can upgrade pool2 to nautilus then run this command 'ceph osd require-osd-release nautilus‘’.After above step i can create new pool without slow_ops.</p>
<p>Thanks.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1619112020-03-29T13:37:20ZWido den Hollanderwido@42on.com
<ul></ul><p>When searching through the code I found this in src/mon/OSDMonitor.cc</p>
<pre>
// update creating pgs first so that we can remove the created pgid and
// process the pool flag removal below in the same osdmap epoch.
auto pending_creatings = update_pending_pgs(pending_inc, tmp);
bufferlist creatings_bl;
encode(pending_creatings, creatings_bl);
t->put(OSD_PG_CREATING_PREFIX, "creating", creatings_bl);
// remove any old (or incompat) POOL_CREATING flags
for (auto& i : tmp.get_pools()) {
if (tmp.require_osd_release < CEPH_RELEASE_NAUTILUS) {
// pre-nautilus OSDMaps shouldn't get this flag.
if (pending_inc.new_pools.count(i.first)) {
pending_inc.new_pools[i.first].flags &= ~pg_pool_t::FLAG_CREATING;
}
}
if (i.second.has_flag(pg_pool_t::FLAG_CREATING) &&
!pending_creatings.still_creating_pool(i.first)) {
dout(10) << __func__ << " done creating pool " << i.first
<< ", clearing CREATING flag" << dendl;
if (pending_inc.new_pools.count(i.first) == 0) {
pending_inc.new_pools[i.first] = i.second;
}
pending_inc.new_pools[i.first].flags &= ~pg_pool_t::FLAG_CREATING;
}
}
</pre>
<p>So if the require_osd_release is <strong>NOT</strong> set to <em>nautilus</em> the code handles OSDMap pruning differently when PGs are in creating state.</p>
<p>Now, I find it very weird that the clusters I've seen this on do not have this flag set to nautilus.</p>
<p>Forgetting this once is a possibility. But having this happen on multiple clusters is very weird. It's a step which I always take (and other people do). It's clearly stated in the Release Notes and such.</p>
<p>Still, this issue shouldn't happen, but it might explain why it happens looking at the code I just posted.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=1619702020-03-30T12:55:52ZJan Fajerskilists@fajerski.name
<ul></ul><p>Fwiw on the cluster I'm seeing this on, I did set this flag after tidying the osd map (removed a couple of destroyed OSDs). Behaviour stayed the same and the pinned_maps count i still just as high.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2041812021-10-12T06:47:26ZIst Gab
<ul></ul><p>Wido den Hollander wrote:</p>
<blockquote>
<p>On a cluster with 1405 OSDs I've ran into a situation for the second time now where a pool creation resulted into massive slow ops on the OSDs serving that pool.</p>
</blockquote>
<p>Were there any solution to this issue? I'm facing with my objectstore multisite environment in 2 out of the 3 cluster which makes RGW down also and outage from user's point of view.</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2043842021-10-15T11:58:09ZWido den Hollanderwido@42on.com
<ul></ul><p>Ist Gab wrote:</p>
<blockquote>
<p>Wido den Hollander wrote:</p>
<blockquote>
<p>On a cluster with 1405 OSDs I've ran into a situation for the second time now where a pool creation resulted into massive slow ops on the OSDs serving that pool.</p>
</blockquote>
<p>Were there any solution to this issue? I'm facing with my objectstore multisite environment in 2 out of the 3 cluster which makes RGW down also and outage from user's point of view.</p>
</blockquote>
<p>Are you sure recovery_deletes is set in the OSDMap?</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2043862021-10-15T12:47:42ZIst Gab
<ul></ul><blockquote>
<p>Are you sure recovery_deletes is set in the OSDMap?</p>
</blockquote>
<p>yeah, these are the flags:<br />flags noout,sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2044402021-10-17T11:27:26ZWido den Hollanderwido@42on.com
<ul></ul><p>Ist Gab wrote:</p>
<blockquote><blockquote>
<p>Are you sure recovery_deletes is set in the OSDMap?</p>
</blockquote>
<p>yeah, these are the flags:<br />flags noout,sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit</p>
</blockquote>
<p>I have never encountered this with that flag enabled. In that case I wouldn't know</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2048302021-10-25T20:50:39ZNeha Ojhanojha@redhat.com
<ul></ul><p>Ist Gab wrote:</p>
<blockquote>
<p>Wido den Hollander wrote:</p>
<blockquote>
<p>On a cluster with 1405 OSDs I've ran into a situation for the second time now where a pool creation resulted into massive slow ops on the OSDs serving that pool.</p>
</blockquote>
<p>Were there any solution to this issue? I'm facing with my objectstore multisite environment in 2 out of the 3 cluster which makes RGW down also and outage from user's point of view.</p>
</blockquote>
<p>Which version are you using?</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2048332021-10-25T21:06:59ZIst Gab
<ul></ul><p>Neha Ojha wrote:</p>
<blockquote>
<p>Which version are you using?</p>
</blockquote>
<p>Octopus 15.2.14</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2091242022-01-24T22:44:47ZNeha Ojhanojha@redhat.com
<ul><li><strong>Priority</strong> changed from <i>High</i> to <i>Normal</i></li></ul><p>Ist Gab wrote:</p>
<blockquote>
<p>Neha Ojha wrote:</p>
<blockquote>
<p>Which version are you using?</p>
</blockquote>
<p>Octopus 15.2.14</p>
</blockquote>
<p>Are you still seeing this problem? Will you be able to provide debug data around this issue?</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2092212022-01-26T12:04:07ZIst Gab
<ul></ul><p>Neha Ojha wrote:</p>
<blockquote>
<p>Are you still seeing this problem? Will you be able to provide debug data around this issue?</p>
</blockquote>
<p>Hi,</p>
<p>The last one was this 2 weeks ago, so far this months 2 times:</p>
<pre><code class="c syntaxhl"><span class="CodeRay">{
<span class="string"><span class="delimiter">"</span><span class="content">archived</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">2022-01-14 13:22:24.494710</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">backtrace</span><span class="delimiter">"</span></span>: [
<span class="string"><span class="delimiter">"</span><span class="content">(()+0x12b20) [0x7fd3ca4f3b20]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(ceph::buffer::v15_2_0::ptr::ptr(ceph::buffer::v15_2_0::ptr const&)+0x1b) [0x7fd3d4a60d7b]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(ceph::buffer::v15_2_0::ptr_node::cloner::operator()(ceph::buffer::v15_2_0::ptr_node const&)+0x2e) [0x7fd3d4a640ce]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >* std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_M_copy<std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_Reuse_or_alloc_node>(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > const*, std::_Rb_tree_node_base*, std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::_Reuse_or_alloc_node&)+0x4c3) [0x7fd3d5263593]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >::operator=(std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > const&)+0x93) [0x7fd3d5263843]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(ObjectCache::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ObjectCacheInfo&, unsigned int, rgw_cache_entry_info*)+0xe39) [0x7fd3d53790a9]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_SysObj_Cache::raw_stat(rgw_raw_obj const&, unsigned long*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, unsigned long*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, ceph::buffer::v15_2_0::list*, RGWObjVersionTracker*, optional_yield)+0x207) [0x7fd3d5727e07]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_SysObj_Core::get_system_obj_state_impl(RGWSysObjectCtxBase*, rgw_raw_obj const&, RGWSysObjState**, RGWObjVersionTracker*, optional_yield)+0x4ec) [0x7fd3d52ec0fc]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_SysObj_Core::get_system_obj_state(RGWSysObjectCtxBase*, rgw_raw_obj const&, RGWSysObjState**, RGWObjVersionTracker*, optional_yield)+0x4a) [0x7fd3d52ec87a]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_SysObj_Core::stat(RGWSysObjectCtxBase&, RGWSI_SysObj_Obj_GetObjState&, rgw_raw_obj const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, bool, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, unsigned long*, RGWObjVersionTracker*, optional_yield)+0x72) [0x7fd3d52f09c2]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_SysObj::Obj::ROp::stat(optional_yield)+0x5f) [0x7fd3d52e890f]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(rgw_get_system_obj(RGWSysObjectCtx&, rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, optional_yield, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, rgw_cache_entry_info*, boost::optional<obj_version>)+0x150) [0x7fd3d565db50]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_MetaBackend_SObj::get_entry(RGWSI_MetaBackend::Context*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWSI_MetaBackend::GetParams&, RGWObjVersionTracker*, optional_yield)+0xe7) [0x7fd3d5717c77]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_Bucket_SObj::read_bucket_entrypoint_info(ptr_wrapper<RGWSI_MetaBackend::Context, 3>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWBucketEntryPoint*, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*, optional_yield, rgw_cache_entry_info*, boost::optional<obj_version>)+0x143) [0x7fd3d56fc1f3]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(()+0x5f3175) [0x7fd3d5338175]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(std::_Function_handler<int (RGWSI_MetaBackend_Handler::Op*), RGWBucketMetadataHandler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (ptr_wrapper<RGWSI_MetaBackend::Context, 3>&)>)::{lambda(RGWSI_MetaBackend_Handler::Op*)#1}>::_M_invoke(std::_Any_data const&, RGWSI_MetaBackend_Handler::Op*&&)+0x36) [0x7fd3d5354d16]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(()+0x9d069e) [0x7fd3d571569e]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_MetaBackend_SObj::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend::Context*)>)+0x9e) [0x7fd3d571855e]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWSI_MetaBackend_Handler::call(std::optional<std::variant<RGWSI_MetaBackend_CtxParams_SObj> >, std::function<int (RGWSI_MetaBackend_Handler::Op*)>)+0x5f) [0x7fd3d57154cf]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWBucketCtl::read_bucket_entrypoint_info(rgw_bucket const&, RGWBucketEntryPoint*, optional_yield, RGWBucketCtl::Bucket::GetParams const&)+0xf1) [0x7fd3d5337551]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWBucketCtl::read_bucket_info(rgw_bucket const&, RGWBucketInfo*, optional_yield, RGWBucketCtl::BucketInstance::GetParams const&, RGWObjVersionTracker*)+0xf4) [0x7fd3d5339744]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(rgw_build_bucket_policies(rgw::sal::RGWRadosStore*, req_state*)+0xe61) [0x7fd3d55571e1]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWHandler::do_init_permissions()+0x32) [0x7fd3d5558832]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(RGWHandler_REST::init_permissions(RGWOp*)+0x178) [0x7fd3d55ed8d8]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool)+0x194) [0x7fd3d5211e64]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, int*)+0x27dc) [0x7fd3d5216eac]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(()+0x423066) [0x7fd3d5168066]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(()+0x424992) [0x7fd3d5169992]</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">(make_fcontext()+0x2f) [0x7fd3d580f95f]</span><span class="delimiter">"</span></span>
],
<span class="string"><span class="delimiter">"</span><span class="content">ceph_version</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">15.2.14</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">crash_id</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">2022-01-14T11:57:07.699354Z_98e41ecc-7fbc-4aca-b6c4-c5750ddc7cf9</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">entity_name</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">client.rgw.hk-cephmon-2s03.rgw0</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">os_id</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">centos</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">os_name</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">CentOS Linux</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">os_version</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">8</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">os_version_id</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">8</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">process_name</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">radosgw</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">stack_sig</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">241d488fddce2f4c5524b40a9aa8e809c71e08a717c8d29cae2bb6cb88aba2cf</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">timestamp</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">2022-01-14T11:57:07.699354Z</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">utsname_hostname</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">hk-cephmon-2s03</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">utsname_machine</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">x86_64</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">utsname_release</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">4.18.0-305.19.1.el8_4.x86_64</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">utsname_sysname</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">Linux</span><span class="delimiter">"</span></span>,
<span class="string"><span class="delimiter">"</span><span class="content">utsname_version</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">#1 SMP Wed Sep 15 15:39:39 UTC 2021</span><span class="delimiter">"</span></span>
}
</span></code></pre> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2095552022-01-31T23:59:18ZNeha Ojhanojha@redhat.com
<ul></ul><p>Ist Gab wrote:</p>
<blockquote>
<p>Neha Ojha wrote:</p>
<blockquote>
<p>Are you still seeing this problem? Will you be able to provide debug data around this issue?</p>
</blockquote>
<p>Hi,</p>
<p>The last one was this 2 weeks ago, so far this months 2 times:</p>
<p>[...]</p>
</blockquote>
<p>So on both occasions this crash was a side effect of new pool creation? Can you provide the output of "ceph -s"?</p> RADOS - Bug #44184: Slow / Hanging Ops after pool creationhttps://tracker.ceph.com/issues/44184?journal_id=2095782022-02-01T02:14:08ZIst Gab
<ul></ul><p>Neha Ojha wrote:</p>
<blockquote>
<p>So on both occasions this crash was a side effect of new pool creation? Can you provide the output of "ceph -s"?</p>
</blockquote>
<p>No, it's objectstore so no new pool creation happened from my side, just the error is the same like in this ticket.</p>