https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2017-06-07T05:54:33Z
Ceph
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92296
2017-06-07T05:54:33Z
David Zafman
dzafman@redhat.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/92296/diff?detail_id=89131">diff</a>)</li></ul>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92304
2017-06-07T08:32:17Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul><p>Upgrading to urgent because it breaks some suites on master.</p>
<p><a class="external" href="http://pulpito.ceph.com/loic-2017-06-07_08:03:59-ceph-disk-loic-master-distro-basic-vps/">http://pulpito.ceph.com/loic-2017-06-07_08:03:59-ceph-disk-loic-master-distro-basic-vps/</a></p>
<pre>
2017-06-06T21:30:15.046 INFO:tasks.ceph.mgr.x.ovh084.stderr: ceph version 12.0.2-2357-gdd1f471 (dd1f47126deb239f70a10183867ef6a20d611d69) luminous (dev)
2017-06-06T21:30:15.046 INFO:tasks.ceph.mgr.x.ovh084.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x557c34084262]
2017-06-06T21:30:15.046 INFO:tasks.ceph.mgr.x.ovh084.stderr: 2: (PGMap::apply_incremental(CephContext*, PGMap::Incremental const&)+0x15b7) [0x557c33ede957]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 3: (ClusterState::update_delta_stats()+0x1f7) [0x557c33f1fec7]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 4: (DaemonServer::send_report()+0xdf) [0x557c33f106cf]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 5: (Mgr::tick()+0xfd) [0x557c33f4896d]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 6: (MgrStandby::tick()+0x121) [0x557c33f3f911]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 7: (FunctionContext::finish(int)+0x2a) [0x557c33f1b66a]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 8: (Context::complete(int)+0x9) [0x557c33f18159]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 9: (SafeTimer::timer_thread()+0x452) [0x557c34080fa2]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 10: (SafeTimerThread::entry()+0xd) [0x557c340823ad]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 11: (()+0x76ba) [0x7fc73c7356ba]
2017-06-06T21:30:15.047 INFO:tasks.ceph.mgr.x.ovh084.stderr: 12: (clone()+0x6d) [0x7fc73b7a182d]
</pre>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92400
2017-06-07T21:33:34Z
Sage Weil
sage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Immediate</i></li></ul>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92409
2017-06-08T02:51:33Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Sage Weil</i></li></ul><p>I think there are two crashes.. the second one,<br /><pre>
src/mon/PGMap.cc: 1085: FAILED assert(osd_stat.size() == osd_epochs.size())
</pre><br />is I think what Loic saw (it's what I'm getting from the ceph-disk suite). It should be fixed by<br /><pre>
diff --git a/src/mon/PGMap.cc b/src/mon/PGMap.cc
index a623e20..b39b6a3 100644
--- a/src/mon/PGMap.cc
+++ b/src/mon/PGMap.cc
@@ -1130,6 +1130,7 @@ void PGMap::apply_incremental(CephContext *cct, const Incremental& inc)
if (t != osd_stat.end()) {
stat_osd_sub(t->first, t->second);
osd_stat.erase(t);
+ osd_epochs.erase(*p);
}
// remove these old osds from full/nearfull set(s), too
</pre><br />testing, and still working on the first one.</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92412
2017-06-08T07:47:37Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>I'm testing the proposed patch as well.</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92424
2017-06-08T13:21:16Z
Sage Weil
sage@newdream.net
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/15573">https://github.com/ceph/ceph/pull/15573</a> for the second instance that loic saw; haven't figured out the first instance yet.</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92500
2017-06-09T16:00:35Z
Yuri Weinstein
yweinste@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/15573">https://github.com/ceph/ceph/pull/15573</a> merged</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92623
2017-06-12T18:21:21Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=92816
2017-06-14T06:27:41Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Category</strong> changed from <i>106</i> to <i>110</i></li></ul>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=93001
2017-06-16T13:38:29Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>12</i></li></ul><p>first instance happened again:<br /><pre>
2017-06-16T02:34:24.509 INFO:tasks.ceph.mgr.x.smithi198.stderr:/build/ceph-12.0.3-1748-g62eaa13/src/mon/PGMap.cc: In function 'void PGMap::stat_pg_sub(const pg_t&, const pg_stat_t&, bool)' thread 7f8cd328b700 time 2017-06-16 02:34:24.509213
2017-06-16T02:34:24.509 INFO:tasks.ceph.mgr.x.smithi198.stderr:/build/ceph-12.0.3-1748-g62eaa13/src/mon/PGMap.cc: 1370: FAILED assert(end >= 0)
2017-06-16T02:34:24.525 INFO:tasks.ceph.mgr.x.smithi198.stderr: ceph version 12.0.3-1748-g62eaa13 (62eaa139b3add8abb500dc17b522ce9bbfb9fac3) luminous (dev)
2017-06-16T02:34:24.525 INFO:tasks.ceph.mgr.x.smithi198.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5601ff0cb0d2]
2017-06-16T02:34:24.525 INFO:tasks.ceph.mgr.x.smithi198.stderr: 2: (()+0x2446d9) [0x5601fef246d9]
2017-06-16T02:34:24.525 INFO:tasks.ceph.mgr.x.smithi198.stderr: 3: (PGMap::apply_incremental(CephContext*, PGMap::Incremental const&)+0x2b0) [0x5601fef24990]
2017-06-16T02:34:24.525 INFO:tasks.ceph.mgr.x.smithi198.stderr: 4: (ClusterState::notify_osdmap(OSDMap const&)+0x675) [0x5601fef6b345]
2017-06-16T02:34:24.525 INFO:tasks.ceph.mgr.x.smithi198.stderr: 5: (()+0x2b19a2) [0x5601fef919a2]
2017-06-16T02:34:24.526 INFO:tasks.ceph.mgr.x.smithi198.stderr: 6: (Mgr::handle_osd_map()+0xb3) [0x5601fef91f43]
2017-06-16T02:34:24.526 INFO:tasks.ceph.mgr.x.smithi198.stderr: 7: (Mgr::ms_dispatch(Message*)+0x164) [0x5601fef93144]
2017-06-16T02:34:24.526 INFO:tasks.ceph.mgr.x.smithi198.stderr: 8: (MgrStandby::ms_dispatch(Message*)+0xb4) [0x5601fef89304]
2017-06-16T02:34:24.526 INFO:tasks.ceph.mgr.x.smithi198.stderr: 9: (DispatchQueue::entry()+0xf4a) [0x5601ff32acfa]
2017-06-16T02:34:24.526 INFO:tasks.ceph.mgr.x.smithi198.stderr: 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x5601ff17857d]
2017-06-16T02:34:24.526 INFO:tasks.ceph.mgr.x.smithi198.stderr: 11: (()+0x76ba) [0x7f8cd9f416ba]
2017-06-16T02:34:24.526 INFO:tasks.ceph.mgr.x.smithi198.stderr: 12: (clone()+0x6d) [0x7f8cd8fad82d]
2017-06-16T02:34:24.526 INFO:tasks.ceph.mgr.x.smithi198.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2017-06-16T02:34:24.527 INFO:tasks.ceph.mgr.x.smithi198.stderr:2017-06-16 02:34:24.526256 7f8cd328b700 -1 /build/ceph-12.0.3-1748-g62eaa13/src/mon/PGMap.cc: In function 'void PGMap::stat_pg_sub(const pg_t&, const pg_stat_t&, bool)' thread 7f8cd328b700 time 2017-06-16 02:34:24.509213
</pre><br />/a/sage-2017-06-16_00:46:50-rados-wip-sage-testing-distro-basic-smithi/1292390</p>
<p>unfortunately the core file seems to be bad. :(</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=93017
2017-06-16T19:40:48Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Need More Info</i></li></ul>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=93018
2017-06-16T19:40:56Z
Sage Weil
sage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Immediate</i> to <i>Urgent</i></li></ul>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=93214
2017-06-19T15:54:26Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>mgr</i></li><li><strong>Category</strong> deleted (<del><i>110</i></del>)</li></ul>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=93538
2017-06-21T15:23:54Z
Sage Weil
sage@newdream.net
<ul></ul><p>/a/sage-2017-06-21_02:01:04-rados-wip-sage-testing2-distro-basic-smithi/1308406</p>
<p>core file again doesn't work. wth..<br /><pre>
Missing separate debuginfo for the main executable file
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/3e/0788b32f603e75fb8594bb211d83af6d28e3e5
Core was generated by `ceph-mgr -f --cluster ceph -i y'.
Program terminated with signal 6, Aborted.
#0 0x00007f57da86a23b in ?? ()
"/home/ubuntu/1498021974.23889.core" is a core file.
Please specify an executable to debug.
(gdb) q
[ubuntu@smithi148 ~]$ ls /usr/lib/debug/.build-id/3e/0788b32f603e75fb8594bb211d83af6d28e3e5
ls: cannot access /usr/lib/debug/.build-id/3e/0788b32f603e75fb8594bb211d83af6d28e3e5: No such file or directory
[ubuntu@smithi148 ~]$ ls /usr/lib/debug/.build-id/3e/
a14aebdd22407fb52918a54f3b4a8b7a70ca28 a14aebdd22407fb52918a54f3b4a8b7a70ca28.debug
[ubuntu@smithi148 ~]$ rpm -qa | grep debuginfo
ceph-debuginfo-12.0.3-1957.gcbf8433.el7.x86_64
</pre></p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=93559
2017-06-21T20:59:45Z
Sage Weil
sage@newdream.net
<ul></ul><p>/a/sage-2017-06-21_17:39:30-rados-wip-bluestore-test-fsck-distro-basic-smithi/1312108</p>
<p>core finally works!</p>
<p>a and b have inc and pg_map dumps</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=93914
2017-06-28T15:24:16Z
Greg Farnum
gfarnum@redhat.com
<ul></ul><p>Waiting to reproduce with newly-set higher debug log levels.</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=94359
2017-07-05T16:13:45Z
Sage Weil
sage@newdream.net
<ul></ul><p>revert teuthology.git 31156d94c621e868c1fbb546015fced0bb56a873 when this is found (it raises debug mgr = 30)</p>
<p>Another instance, but without logs:</p>
<p>/a/sage-2017-07-03_15:41:59-rados-wip-sage-testing-distro-basic-smithi/1356383</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=94693
2017-07-07T15:33:04Z
Sage Weil
sage@newdream.net
<ul></ul><p>Another case, without logs. Another hint, though..<br /><pre>
-198> 2017-07-07 02:18:26.111735 7f313c656700 10 mgr notify_osdmap v135
-197> 2017-07-07 02:18:26.111739 7f313c656700 10 check_osd_map pool 90 gone, removing pgs
-196> 2017-07-07 02:18:26.111754 7f313c656700 10 check_osd_map pool 83 gone, removing pgs
-195> 2017-07-07 02:18:26.111762 7f313c656700 10 check_osd_map pool 15 pg_num 11 != my pg_num 8
-194> 2017-07-07 02:18:26.111782 7f313c656700 20 check_osd_map adding 15.8
-193> 2017-07-07 02:18:26.111785 7f313c656700 20 check_osd_map adding 15.9
-192> 2017-07-07 02:18:26.111787 7f313c656700 20 check_osd_map adding 15.a
-191> 2017-07-07 02:18:26.111902 7f313c656700 20 deleted pool 83
-190> 2017-07-07 02:18:26.111924 7f313c656700 20 deleted pool 90
...
-147> 2017-07-07 02:18:26.403722 7f3138b4c700 10 mgr update_delta_stats v136
...
-11> 2017-07-07 02:18:27.204029 7f313c656700 10 mgr notify_osdmap v137
-10> 2017-07-07 02:18:27.204035 7f313c656700 10 check_osd_map pool 15 pg_num 11 != my pg_num 8
-9> 2017-07-07 02:18:27.204038 7f313c656700 10 check_osd_map pool 91 pg_num 8 != my pg_num 0
-8> 2017-07-07 02:18:27.204040 7f313c656700 20 check_osd_map adding 91.0
-7> 2017-07-07 02:18:27.204044 7f313c656700 20 check_osd_map adding 91.1
-6> 2017-07-07 02:18:27.204045 7f313c656700 20 check_osd_map adding 91.2
-5> 2017-07-07 02:18:27.204047 7f313c656700 20 check_osd_map adding 91.3
-4> 2017-07-07 02:18:27.204048 7f313c656700 20 check_osd_map adding 91.4
-3> 2017-07-07 02:18:27.204050 7f313c656700 20 check_osd_map adding 91.5
-2> 2017-07-07 02:18:27.204051 7f313c656700 20 check_osd_map adding 91.6
-1> 2017-07-07 02:18:27.204052 7f313c656700 20 check_osd_map adding 91.7
</pre><br />not that the pool 15 count didn't match twice! also, the second time they weren't printed, which means they were already in pending_inc. probably an update from the osd?</p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=94873
2017-07-11T15:36:20Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Need More Info</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/16262">https://github.com/ceph/ceph/pull/16262</a></p>
mgr - Bug #20208: ceph-mgr: mon/PGMap.cc: 1367: FAILED assert(end >= 0)
https://tracker.ceph.com/issues/20208?journal_id=94970
2017-07-12T15:11:33Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>