https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2017-05-17T00:31:34Z
Ceph
Ceph - Bug #19900: osd crash during shutdown
https://tracker.ceph.com/issues/19900?journal_id=91110
2017-05-17T00:31:34Z
Josh Durgin
<ul><li><strong>Subject</strong> changed from <i>osd leak of ExpiringCrpytoKey etc</i> to <i>osd crash during shutdown</i></li></ul><p>This was actually a crash, from teuthology.log:<br /><pre>
2017-05-10T04:50:27.383 DEBUG:tasks.ceph.osd.4:waiting for process to exit
2017-05-10T04:50:27.383 INFO:teuthology.orchestra.run:waiting for 300
2017-05-10T04:50:27.409 INFO:tasks.ceph.osd.4.smithi130.stderr:2017-05-10 04:50:27.387550 323e7700 -1 received signal: Terminated from PID: 31142 task name: /usr/bin/python /bin/daemon-helper term valgrind --
trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.4.log --time-stamp=yes --tool=memcheck ceph-os
d -f --cluster ceph -i 4 UID: 0
2017-05-10T04:50:27.409 INFO:tasks.ceph.osd.4.smithi130.stderr:2017-05-10 04:50:27.389154 323e7700 -1 osd.4 231 *** Got signal Terminated ***
2017-05-10T04:50:27.571 INFO:tasks.ceph.osd.4.smithi130.stderr:2017-05-10 04:50:27.448222 323e7700 -1 osd.4 231 shutdown
2017-05-10T04:50:30.668 INFO:tasks.ceph.osd.4.smithi130.stderr:/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-21
98-g8760432/rpm/el7/BUILD/ceph-12.0.1-2198-g8760432/src/mon/MonClient.cc: In function 'int MonClient::wait_auth_rotating(double)' thread 158af700 time 2017-05-10 04:50:30.668855
2017-05-10T04:50:30.733 INFO:tasks.ceph.osd.4.smithi130.stderr:/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-21
98-g8760432/rpm/el7/BUILD/ceph-12.0.1-2198-g8760432/src/mon/MonClient.cc: 901: FAILED assert(auth != nullptr)
2017-05-10T04:50:30.759 INFO:tasks.ceph.osd.4.smithi130.stderr: ceph version 12.0.1-2198-g8760432 (87604323e6d350d533c826fdb91f2052a11d3de9)
2017-05-10T04:50:30.759 INFO:tasks.ceph.osd.4.smithi130.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0xae6630]
2017-05-10T04:50:30.759 INFO:tasks.ceph.osd.4.smithi130.stderr: 2: (MonClient::wait_auth_rotating(double)+0x63c) [0xaf746c]
2017-05-10T04:50:30.759 INFO:tasks.ceph.osd.4.smithi130.stderr: 3: (OSD::ms_get_authorizer(int, AuthAuthorizer**, bool)+0xbc) [0x5771ec]
2017-05-10T04:50:30.759 INFO:tasks.ceph.osd.4.smithi130.stderr: 4: (SimpleMessenger::get_authorizer(int, bool)+0x5b) [0xb5e54b]
2017-05-10T04:50:30.759 INFO:tasks.ceph.osd.4.smithi130.stderr: 5: (Pipe::connect()+0x26f9) [0xcd92e9]
2017-05-10T04:50:30.760 INFO:tasks.ceph.osd.4.smithi130.stderr: 6: (Pipe::writer()+0x531) [0xcdbee1]
2017-05-10T04:50:30.760 INFO:tasks.ceph.osd.4.smithi130.stderr: 7: (Pipe::Writer::entry()+0xd) [0xcde9ad]
2017-05-10T04:50:30.760 INFO:tasks.ceph.osd.4.smithi130.stderr: 8: (()+0x7dc5) [0xc729dc5]
2017-05-10T04:50:30.760 INFO:tasks.ceph.osd.4.smithi130.stderr: 9: (clone()+0x6d) [0xd87573d]
2017-05-10T04:50:30.760 INFO:tasks.ceph.osd.4.smithi130.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre></p>
Ceph - Bug #19900: osd crash during shutdown
https://tracker.ceph.com/issues/19900?journal_id=91111
2017-05-17T00:35:47Z
Brad Hubbard
bhubbard@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Brad Hubbard</i></li></ul>
Ceph - Bug #19900: osd crash during shutdown
https://tracker.ceph.com/issues/19900?journal_id=91119
2017-05-17T06:15:34Z
Kefu Chai
tchaikov@gmail.com
<ul></ul><pre>
-1927> 2017-05-10 04:50:27.436359 1ebc0700 20 osd.4 231 OSD::ms_dispatch: MOSDMarkMeDown(request_ack=0, target_osd=osd.4 172.21.15.130:6801/31147, fsid=db4a4e71-b01d-4cf1-bba3-b9b29f51cd70) v2
-1926> 2017-05-10 04:50:27.438216 1ebc0700 0 osd.4 231 got_stop_ack starting shutdown
-1925> 2017-05-10 04:50:27.443950 323e7700 0 osd.4 231 prepare_to_stop starting shutdown
-1924> 2017-05-10 04:50:27.448222 323e7700 -1 osd.4 231 shutdown
..
-1440> 2017-05-10 04:50:27.835363 1ebc0700 1 -- 172.21.15.130:6801/31147 <== mon.2 172.21.15.19:6790/0 4 ==== mgrmap(e 4) v1 ==== 90+0+0 (1486229013 0 0) 0xf685aa0 con 0x3a461540
<< got a mgrmap via con 0x3a461540
-1439> 2017-05-10 04:50:27.835574 1ebc0700 4 mgrc handle_mgr_map Got map version 4
-1438> 2017-05-10 04:50:27.835621 1ebc0700 4 mgrc handle_mgr_map Active mgr is now 172.21.15.130:6800/12188
-1437> 2017-05-10 04:50:27.836396 1ebc0700 4 mgrc reconnect Starting new session with 172.21.15.130:6800/12188
..
-1411> 2017-05-10 04:50:27.843961 970d700 10 -- 172.21.15.130:6801/31147 >> 172.21.15.130:6800/12188 pipe(0x1846d570 sd=34 :58180 s=2 pgs=266 cs=1 l=1 c=0x1846e890).reader got message 2 0xf68b3a0 mgrconfigure() v1
..
-1403> 2017-05-10 04:50:27.845439 1ebc0700 4 mgrc handle_mgr_configure stats_period=5
..
-112> 2017-05-10 04:50:30.199612 323e7700 1 bluefs umount
..
-95> 2017-05-10 04:50:30.494950 323e7700 10 monclient: shutdown
-94> 2017-05-10 04:50:30.496994 323e7700 1 -- 172.21.15.130:6801/31147 mark_down 0x3a461540 -- 0x192e7fc0
<< we mark con 0x3a461540 now.
..
-54> 2017-05-10 04:50:30.625266 1ebc0700 4 mgrc ms_handle_reset ms_handle_reset con 0x1846e890
-53> 2017-05-10 04:50:30.627750 1ebc0700 4 mgrc reconnect Terminating session with 172.21.15.130:6800/12188
-52> 2017-05-10 04:50:30.629387 1ebc0700 1 -- 172.21.15.130:6801/31147 mark_down 0x1846e890 -- pipe dne
-51> 2017-05-10 04:50:30.631857 1ebc0700 4 mgrc reconnect Starting new session with 172.21.15.130:6800/12188
..
-15> 2017-05-10 04:50:30.665362 158af700 0 -- 172.21.15.130:6801/31147 >> 172.21.15.130:6800/12188 pipe(0x3a056d30 sd=21 :60649 s=1 pgs=0 cs=0 l=1 c=0xe2375d0).connect got BADAUTHORIZER
-14> 2017-05-10 04:50:30.667752 158af700 10 osd.4 0 OSD::ms_get_authorizer type=mgr
-13> 2017-05-10 04:50:30.752065 158af700 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-2198-g8760432/rpm/el7/BUILD/ceph-12.0.1-2198-g8760432/src/mon/MonClient.cc: In f
unction 'int MonClient::wait_auth_rotating(double)' thread 158af700 time 2017-05-10 04:50:30.668855
</pre>
<p>so even with <a class="external" href="https://github.com/ceph/ceph/commit/611572a6103e20d3e3abf3530ea33e539b9db371">https://github.com/ceph/ceph/commit/611572a6103e20d3e3abf3530ea33e539b9db371</a>, MgrClient can still try to (re)connect to mgr after <code>MgrClient::shutdown()</code>, when handling mgrmap after its mgrsession got reset.</p>
Ceph - Bug #19900: osd crash during shutdown
https://tracker.ceph.com/issues/19900?journal_id=91287
2017-05-22T02:30:40Z
Brad Hubbard
bhubbard@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/15192">https://github.com/ceph/ceph/pull/15192</a></p>
Ceph - Bug #19900: osd crash during shutdown
https://tracker.ceph.com/issues/19900?journal_id=91769
2017-05-28T03:04:25Z
Sage Weil
sage@newdream.net
<ul></ul><p>/a/sage-2017-05-27_03:43:09-rados-wip-sage-testing2---basic-smithi/1235367</p>
<p>the branch contained eb5c02d from <a class="external" href="https://github.com/ceph/ceph/pull/15192">https://github.com/ceph/ceph/pull/15192</a>, so not fixed yet!</p>
Ceph - Bug #19900: osd crash during shutdown
https://tracker.ceph.com/issues/19900?journal_id=91776
2017-05-29T00:54:01Z
Brad Hubbard
bhubbard@redhat.com
<ul></ul><p>OMG.</p>
<p>How did we all (especially me) not realise there was a missing return statement in <a class="external" href="https://github.com/ceph/ceph/pull/15192/commits/eb5c02df634b34ca45ecbf1eaf4440ccf806845f">https://github.com/ceph/ceph/pull/15192/commits/eb5c02df634b34ca45ecbf1eaf4440ccf806845f</a> ?</p>
<p><a class="external" href="https://github.com/ceph/ceph/pull/15345">https://github.com/ceph/ceph/pull/15345</a></p>
Ceph - Bug #19900: osd crash during shutdown
https://tracker.ceph.com/issues/19900?journal_id=91779
2017-05-29T08:26:30Z
Brad Hubbard
bhubbard@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul>