https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2015-05-07T09:57:31ZCeph Ceph - Bug #11558: assert(0 == "hit suicide timeout") in HeartbeatMap due to dead lock in rocksdbhttps://tracker.ceph.com/issues/11558?journal_id=514482015-05-07T09:57:31ZHaomai Wanghaomaiwang@gmail.com
<ul><li><strong>Category</strong> set to <i>OSD</i></li><li><strong>Target version</strong> set to <i>v0.94</i></li></ul> Ceph - Bug #11558: assert(0 == "hit suicide timeout") in HeartbeatMap due to dead lock in rocksdbhttps://tracker.ceph.com/issues/11558?journal_id=515172015-05-08T02:14:37ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/51517/diff?detail_id=50301">diff</a>)</li></ul> Ceph - Bug #11558: assert(0 == "hit suicide timeout") in HeartbeatMap due to dead lock in rocksdbhttps://tracker.ceph.com/issues/11558?journal_id=515202015-05-08T02:34:18ZXinze Chixmdxcxz@gmail.com
<ul></ul><p>common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")</p>
<pre><code>ceph version 0.94-17-gc341f91 (c341f91ed1be851f60ceec37ad5789c4ddd4122e)<br /> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x78) [0xbbf8d8]<br /> 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2a9) [0xafcdc9]<br /> 3: (ceph::HeartbeatMap::is_healthy()+0xce) [0xafd64e]<br /> 4: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xafdd27]<br /> 5: (CephContextServiceThread::entry()+0x14b) [0xbcf62b]<br /> 6: (()+0x7df3) [0x7f3922606df3]<br /> 7: (clone()+0x6d) [0x7f3920ed53dd]<br /> NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.</code></pre>
<p>This is log from ceph osd:</p>
<p>2015-05-07 15:05:49.627738 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39090b1700' had timed out after 15<br />2015-05-07 15:05:49.627740 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f3917a39700' had timed out after 60<br />2015-05-07 15:05:49.627742 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f391823a700' had timed out after 60<br />2015-05-07 15:05:54.627869 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39080af700' had timed out after 15<br />2015-05-07 15:05:54.627886 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39090b1700' had timed out after 15<br />2015-05-07 15:05:54.627889 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f3917a39700' had timed out after 60<br />2015-05-07 15:05:54.627891 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f391823a700' had timed out after 60<br />2015-05-07 15:05:59.628012 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39080af700' had timed out after 15<br />2015-05-07 15:05:59.628029 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39090b1700' had timed out after 15<br />2015-05-07 15:05:59.628035 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f3917a39700' had timed out after 60<br />2015-05-07 15:05:59.628036 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f391823a700' had timed out after 60<br />2015-05-07 15:06:04.628159 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39080af700' had timed out after 15<br />2015-05-07 15:06:04.628176 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39090b1700' had timed out after 15<br />2015-05-07 15:06:04.628178 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f3917a39700' had timed out after 60<br />2015-05-07 15:06:04.628180 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f391823a700' had timed out after 60<br />2015-05-07 15:06:09.628244 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39080af700' had timed out after 15<br />2015-05-07 15:06:09.628262 7f391fd65700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f39090b1700' had timed out after 15<br />2015-05-07 15:06:09.628264 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f3917a39700' had timed out after 60<br />2015-05-07 15:06:09.628265 7f391fd65700 1 heartbeat_map is_healthy 'KeyValueStore::op_tp thread 0x7f391823a700' had timed out after 60</p> Ceph - Bug #11558: assert(0 == "hit suicide timeout") in HeartbeatMap due to dead lock in rocksdbhttps://tracker.ceph.com/issues/11558?journal_id=515752015-05-09T16:03:09ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Subject</strong> changed from <i>rocksdb bug</i> to <i>assert(0 == "hit suicide timeout") in HeartbeatMap due to dead lock in rocksdb</i></li></ul> Ceph - Bug #11558: assert(0 == "hit suicide timeout") in HeartbeatMap due to dead lock in rocksdbhttps://tracker.ceph.com/issues/11558?journal_id=516022015-05-11T04:56:07ZXinze Chixmdxcxz@gmail.com
<ul></ul><p>the use the default rocksdb::max_open_files 5000 in commit 0bd767fb7ecca78033dc9d99f221e88ad0c4b289.<br />But the system max open files is 1024.</p>
<p>So I think maybe rocksdb::max_open_files 5000 should not be the default value in ceph.conf.</p> Ceph - Bug #11558: assert(0 == "hit suicide timeout") in HeartbeatMap due to dead lock in rocksdbhttps://tracker.ceph.com/issues/11558?journal_id=516032015-05-11T07:25:31ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Won't Fix</i></li></ul><p>since the kv store is still an experimental feature, and its settings are subject to change, i am closing it as "won't fix". also i am asking xiaoxi for his opinion at <a class="external" href="https://github.com/ceph/ceph/commit/0bd767fb7ecca78033dc9d99f221e88ad0c4b289">https://github.com/ceph/ceph/commit/0bd767fb7ecca78033dc9d99f221e88ad0c4b289</a> .</p>