Bug #19204
closedMDS assert failed when shutting down
0%
Description
We encountered a failed assertion when trying to shutdown an MDS. Here is a snippet of the log:
14> 2017-01-22 14:13:46.833804 7fd210c58700 2 -192.168.36.11:6801/2188363 >> 192.168.36.48:6800/42546 pipe(0x558ff3803400 sd=17 :52412 s=4 pgs=227 cs=1 l=1 c=0x558ff3758900).fault (0) Success13> 2017-01-22 14:13:46.833802 7fd2092e6700 2 -192.168.36.11:6801/2188363 >> 192.168.36.12:6813/4037017 pipe(0x558ff3802000 sd=72 :32894 s=4 pgs=24 cs=1 l=1 c=0x558ffc199200).reader couldn't read tag, (0) Success12> 2017-01-22 14:13:46.833831 7fd2092e6700 2 -192.168.36.11:6801/2188363 >> 192.168.36.12:6813/4037017 pipe(0x558ff3802000 sd=72 :32894 s=4 pgs=24 cs=1 l=1 c=0x558ffc199200).fault (0) Success11> 2017-01-22 14:13:46.833884 7fd213861700 5 asok(0x558ff373a000) unregister_command objecter_requests192.168.36.11:6801/2188363 mark_down 0x558ffc198600 -- 0x558ffa52e000
-10> 2017-01-22 14:13:46.833896 7fd213861700 10 monclient: shutdown
-9> 2017-01-22 14:13:46.833901 7fd213861700 1 -8> 2017-01-22 14:13:46.833922 7fd2080d4700 2 -192.168.36.11:6801/2188363 >> 192.168.36.12:6819/4037213 pipe(0x558ff500e800 sd=82 :32834 s=4 pgs=25 cs=1 l=1 c=0x558ffc19bc00).reader couldn't read tag, (0) Success7> 2017-01-22 14:13:46.833943 7fd2080d4700 2 -192.168.36.11:6801/2188363 >> 192.168.36.12:6819/4037213 pipe(0x558ff500e800 sd=82 :32834 s=4 pgs=25 cs=1 l=1 c=0x558ffc19bc00).fault (0) Success6> 2017-01-22 14:13:46.833937 7fd214964700 2 -192.168.36.11:6801/2188363 >> 192.168.36.11:6789/0 pipe(0x558ffa52e000 sd=8 :52298 s=4 pgs=31815 cs=1 l=1 c=0x558ffc198600).reader couldn't read tag, (0) Success5> 2017-01-22 14:13:46.833954 7fd214964700 2 -192.168.36.11:6801/2188363 >> 192.168.36.11:6789/0 pipe(0x558ffa52e000 sd=8 :52298 s=4 pgs=31815 cs=1 l=1 c=0x558ffc198600).fault (0) Success4> 2017-01-22 14:13:46.833959 7fd210b57700 2 -192.168.36.11:6801/2188363 >> 192.168.36.11:6800/678824 pipe(0x558ff3804800 sd=18 :45286 s=4 pgs=198 cs=1 l=1 c=0x558ff3758c00).reader couldn't read tag, (0) Success3> 2017-01-22 14:13:46.833972 7fd210b57700 2 -192.168.36.11:6801/2188363 >> 192.168.36.11:6800/678824 pipe(0x558ff3804800 sd=18 :45286 s=4 pgs=198 cs=1 l=1 c=0x558ff3758c00).fault (0) Success2> 2017-01-22 14:13:46.834029 7fd20e437700 2 -192.168.36.11:6801/2188363 >> 192.168.36.48:6804/42771 pipe(0x558ff5034000 sd=33 :35778 s=4 pgs=300 cs=1 l=1 c=0x558ff375ba80).reader couldn't read tag, (0) Success1> 2017-01-22 14:13:46.834062 7fd20e437700 2 -192.168.36.11:6801/2188363 >> 192.168.36.48:6804/42771 pipe(0x558ff5034000 sd=33 :35778 s=4 pgs=300 cs=1 l=1 c=0x558ff375ba80).fault (0) Success
0> 2017-01-22 14:13:46.836775 7fd21285f700 -1 osdc/Objecter.cc: In function 'void Objecter::_op_submit_with_budget(Objecter::Op*, Objecter::shunique_lock&, ceph_tid_t*, int*)' thread 7fd21285f700 time 2017-01-22 14:13:46.834106
osdc/Objecter.cc: 2145: FAILED assert(initialized.read())ceph version 10.2.5 (53ded15a3fab78780028baa5681f578254e2b9df)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x88) [0x558fe7a1ca18]
2: (Objecter::_op_submit_with_budget(Objecter::Op*, ceph::shunique_lock<boost::shared_mutex>&, unsigned long*, int*)+0x3ad) [0x558fe78b068d]
3: (Objecter::op_submit(Objecter::Op*, unsigned long*, int*)+0x6e) [0x558fe78b07ae]
4: (Filer::_probe(Filer::Probe*, std::unique_lock<std::mutex>&)+0xbea) [0x558fe788524a]
5: (Filer::_probed(Filer::Probe*, object_t const&, unsigned long, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::unique_lock<std::mutex>&)+0x9bb) [0x558fe788671b]
6: (Filer::C_Probe::finish(int)+0x6c) [0x558fe7888dac]
7: (Context::complete(int)+0x9) [0x558fe7606be9]
8: (Finisher::finisher_thread_entry()+0x4c5) [0x558fe793e305]
9: (()+0x8182) [0x7fd21d371182]
10: (clone()+0x6d) [0x7fd21b8ba47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
It seems to be caused by improper shutdown order of MDS subsystems. The Finisher was still trying to use Objecter while it was already down.