https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2018-05-09T21:02:50Z
Ceph
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=112941
2018-05-09T21:02:50Z
Josh Durgin
<ul></ul><p>Sounds like a use-after-free of some sort, unrelated to other crashes we've seen.</p>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113008
2018-05-10T20:25:40Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Severity</strong> changed from <i>3 - minor</i> to <i>2 - major</i></li></ul><p>Here's another that looks related:</p>
<pre>
Crash: "2018-05-08 18:53:24.101339 mon.a mon.0 172.21.15.17:6789/0 334 : cluster [WRN] MDS health message (mds.0): 1 slow requests are blocked > 30 sec" in cluster log
ceph version 13.1.0-37-g5e16dd6 (5e16dd6cf52c7659501738658195ccd76e4d2cad) mimic (dev)
1: (()+0x9159e0) [0x55d4533b59e0]
2: (()+0x11390) [0x7fdf68475390]
3: (gsignal()+0x38) [0x7fdf67bc2428]
4: (abort()+0x16a) [0x7fdf67bc402a]
5: (()+0x2dbd7) [0x7fdf67bbabd7]
6: (()+0x2dc82) [0x7fdf67bbac82]
7: (OpTracker::unregister_inflight_op(TrackedOp*)+0x42d) [0x55d4530dd6ed]
8: (TrackedOp::put()+0x33e) [0x55d452ea0b6e]
9: (ECBackend::Op::~Op()+0x8b) [0x55d45317de3b]
10: (std::_Rb_tree<unsigned long, std::pair<unsigned long const, ECBackend::Op>, std::_Select1st<std::pair<unsigned long const, ECBackend::Op> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, ECBackend::Op> > >::erase(unsigned long const&)+0x122) [0x55d45317e642]
11: (ECBackend::try_finish_rmw()+0x182) [0x55d45315ff02]
12: (ECBackend::check_ops()+0x28) [0x55d453166038]
13: (ECBackend::handle_sub_write_reply(pg_shard_t, ECSubWriteReply const&, ZTracer::Trace const&)+0xd2) [0x55d453166112]
14: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x1cd) [0x55d45316be3d]
15: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x55d4530578b7]
16: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6ed) [0x55d4530062dd]
17: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b8) [0x55d452e61058]
18: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x55d4530d9bb2]
19: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590) [0x55d452e807b0]
20: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7fdf699c70ae]
21: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fdf699c9130]
22: (()+0x76ba) [0x7fdf6846b6ba]
23: (clone()+0x6d) [0x7fdf67c9441d]
1 jobs: ['2501825']
suites: ['clusters/9-mds.yaml', 'frag_enable.yaml', 'fuse-default-perm-no.yaml}', 'inline/yes.yaml', 'mount/kclient.yaml', 'multimds/basic/{begin.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'overrides/{basic/{debug.yaml', 'q_check_counter/check_counter.yaml', 'tasks/cfuse_workunit_suites_ffsb.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
</pre>
<p>From: <a class="external" href="http://pulpito.ceph.com/pdonnell-2018-05-08_18:15:43-multimds-mimic-testing-basic-smithi/2501825">http://pulpito.ceph.com/pdonnell-2018-05-08_18:15:43-multimds-mimic-testing-basic-smithi/2501825</a></p>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113433
2018-05-17T13:58:06Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Radoslaw Zarzynski</i></li></ul>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113434
2018-05-17T14:00:21Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-9 priority-5 priority-high3 closed" href="/issues/23892">Bug #23892</a>: luminous->mimic: mon segv in ~MonOpRequest from OpHistoryServiceThread</i> added</li></ul>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113505
2018-05-18T13:15:50Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul></ul><p>Scenario I can see after static analysis:</p>
<p>1. An instance of `TrackedOp` in `STATE_LIVE` is being dereferenced - the control flows to `TrackedOp::put`.<br />2. Reference counter hits `0` and `OpTracker::unregister_inflight_op(TrackedOp *i)` is being called.<br />3. There is a time window between zeroing the ref-counter and locking the shard.</p>
<pre><code class="cpp syntaxhl"><span class="CodeRay"><span class="keyword">class</span> <span class="class">TrackedOp</span> {
<span class="comment">// ...</span>
<span class="directive">void</span> put() {
<span class="keyword">if</span> (--nref == <span class="integer">0</span>) {
<span class="keyword">switch</span> (state.load()) {
<span class="comment">// ...</span>
<span class="keyword">case</span> STATE_LIVE:
mark_event(<span class="string"><span class="delimiter">"</span><span class="content">done</span><span class="delimiter">"</span></span>);
tracker->unregister_inflight_op(<span class="local-variable">this</span>);
<span class="keyword">break</span>;
<span class="comment">// ...</span>
}
}
}
}
<span class="comment">// ...</span>
<span class="directive">void</span> OpTracker::unregister_inflight_op(TrackedOp *i)
{
<span class="comment">// caller checks;</span>
assert(i->state);
uint32_t shard_index = i->seq % num_optracker_shards;
ShardedTrackingData* sdata = sharded_in_flight_list[shard_index];
assert(<span class="predefined-constant">NULL</span> != sdata);
{
Mutex::Locker locker(sdata->ops_in_flight_lock_sharded);
<span class="error">`</span>
<span class="directive">auto</span> p = sdata->ops_in_flight_sharded.iterator_to(*i);
sdata->ops_in_flight_sharded.erase(p);
}
</span></code></pre>
<p>4. `OSD::get_health_metrics` jumps in between the update and the locking.<br /><pre><code class="cpp syntaxhl"><span class="CodeRay">vector<DaemonHealthMetric> OSD::get_health_metrics()
{
vector<DaemonHealthMetric> metrics;
{
<span class="comment">// ...</span>
TrackedOpRef oldest_op;
<span class="directive">auto</span> count_slow_ops = [&](TrackedOp& op) {
<span class="comment">// ...</span>
oldest_op = &op;
<span class="comment">// ...</span>
};
<span class="keyword">if</span> (op_tracker.visit_ops_in_flight(&oldest_secs, count_slow_ops)) {
<span class="comment">// ...</span>
}
}
</span></code></pre></p>
<p>5. Assigning to `oldest_op` triggers the `TrackedOp::get()`/`TrackedOp::put()` pair, and thus `OpTracker::unregister_inflight_op` can be executed second time for same op <strong>while another call is in progress</strong>. This is dangerous as `boost::intrusive::list::iterator_to` must not be called on an object that doesn't belong to a given list instance (because e.g. it was erased earlier):</p>
<blockquote>
<p>Requires: value must be a reference to a value inserted in a list.</p>
</blockquote>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113506
2018-05-18T13:26:01Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113552
2018-05-18T19:09:17Z
Sage Weil
sage@newdream.net
<ul></ul><p>related?<br /><pre>
2018-05-18T03:29:21.427 INFO:tasks.ceph.mon.a.ovh098.stderr:*** Caught signal (Bus error) **
2018-05-18T03:29:21.427 INFO:tasks.ceph.mon.a.ovh098.stderr: in thread 7f60d12de700 thread_name:OpHistorySvc
2018-05-18T03:29:21.427 INFO:tasks.ceph.mon.a.ovh098.stderr: ceph version 13.1.0-129-gd3a3371 (d3a3371798df3316c173c967bc5300119f20cd22) mimic (rc)
2018-05-18T03:29:21.427 INFO:tasks.ceph.mon.a.ovh098.stderr: 1: (()+0x4b4120) [0x5619e499a120]
2018-05-18T03:29:21.427 INFO:tasks.ceph.mon.a.ovh098.stderr: 2: (()+0x11390) [0x7f60d8900390]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: 3: (RefCountedObject::put() const+0xb8) [0x5619e4791828]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: 4: (MonOpRequest::~MonOpRequest()+0x32) [0x5619e4791b22]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: 5: (std::_Rb_tree<std::pair<double, boost::intrusive_ptr<TrackedOp> >, std::pair<double, boost::intrusive_ptr<TrackedOp> >, std::_Identity<std::pair<double, boost::intrusive_ptr<TrackedOp> > >, std::less<std::pair<double, boost::intrusive_ptr<TrackedOp> > >, std::allocator<std::pair<double, boost::intrusive_ptr<TrackedOp> > > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<double, boost::intrusive_ptr<TrackedOp> > >)+0x83) [0x7f60d9014f83]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: 6: (OpHistory::cleanup(utime_t)+0x2b4) [0x7f60d9011c04]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: 7: (OpHistory::_insert_delayed(utime_t const&, boost::intrusive_ptr<TrackedOp>)+0x20c) [0x7f60d901258c]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: 8: (OpHistoryServiceThread::entry()+0xe9) [0x7f60d9012969]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: 9: (()+0x76ba) [0x7f60d88f66ba]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: 10: (clone()+0x6d) [0x7f60d7c5441d]
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr:2018-05-18 03:29:21.404 7f60d12de700 -1 *** Caught signal (Bus error) **
2018-05-18T03:29:21.428 INFO:tasks.ceph.mon.a.ovh098.stderr: in thread 7f60d12de700 thread_name:OpHistorySvc
</pre><br />/a/teuthology-2018-05-18_02:25:02-upgrade:luminous-x-mimic-distro-basic-ovh/2545544</p>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113771
2018-05-22T15:48:48Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/22156">https://github.com/ceph/ceph/pull/22156</a></p>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113777
2018-05-22T18:41:50Z
Sage Weil
sage@newdream.net
<ul><li><strong>Backport</strong> set to <i>mimic</i></li></ul>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113814
2018-05-23T01:21:43Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li></ul>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113893
2018-05-23T18:31:57Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/24256">Backport #24256</a>: mimic: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.</i> added</li></ul>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=113956
2018-05-24T13:44:52Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=115966
2018-07-02T06:08:57Z
Brad Hubbard
bhubbard@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-5 priority-high3 closed" href="/issues/23352">Bug #23352</a>: osd: segfaults under normal operation</i> added</li></ul>
RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.
https://tracker.ceph.com/issues/24037?journal_id=116295
2018-07-05T21:34:06Z
Brad Hubbard
bhubbard@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-6 priority-high2 closed" href="/issues/24664">Bug #24664</a>: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics</i> added</li></ul>