Bug #52820
closedCeph monitor crash after upgrade from ceph 15.2.14 to 16.2.6
0%
Description
i tried to upgrade my ceph cluster from 15.2.14 to 16.2.6 on my proxmox 7.0 severs.
after updating the packages i restarted the first monitor which worked but the second and third monitor crash on startup. The cluster is somehow still online and working although only one of three mon is online
Oct 05 18:42:43 virthost2 systemd[1]: Started Ceph cluster monitor daemon.
Oct 05 18:42:57 virthost2 ceph-mon[115602]: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::malformed_input'
Oct 05 18:42:57 virthost2 ceph-mon[115602]: what(): void FSMap::decode(ceph::buffer::v15_2_0::list::const_iterator&) no longer understand old encoding version v < 7: Malformed input
Oct 05 18:42:57 virthost2 ceph-mon[115602]: * Caught signal (Aborted) *
Oct 05 18:42:57 virthost2 ceph-mon[115602]: in thread 7f14a74f3700 thread_name:ms_dispatch
Oct 05 18:42:57 virthost2 ceph-mon[115602]: ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(0x14140) [0x7f14b0190140]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2: gsignal()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 3: abort()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 4: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0x9a7ec) [0x7f14b00437ec]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 5: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa5966) [0x7f14b004e966]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 6: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa59d1) [0x7f14b004e9d1]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 7: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa5c65) [0x7f14b004ec65]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 8: /usr/lib/ceph/libceph-common.so.2(+0x28982a) [0x7f14b06e682a]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 9: (MDSMonitor::tick()+0x475) [0x55a3d8709015]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 10: (MDSMonitor::on_active()+0x28) [0x55a3d86ef068]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 11: (Context::complete(int)+0x9) [0x55a3d850fc29]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 12: (void finish_contexts<std::__cxx11::list<Context, std::allocator<Context*> > >(ceph::common::CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0xa8) [0x55a3d853b458]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 13: (Paxos::finish_round()+0x70) [0x55a3d8623100]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 14: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x3d3) [0x55a3d8624ef3]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x116b) [0x55a3d850d7eb]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 16: (Monitor::_ms_dispatch(Message*)+0x41e) [0x55a3d850de2e]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 17: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x59) [0x55a3d853c9d9]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x468) [0x7f14b08d5eb8]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 19: (DispatchQueue::entry()+0x5ef) [0x7f14b08d35bf]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f14b0990cbd]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f14b0184ea7]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 22: clone()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2021-10-05T18:42:57.045+0200 7f14a74f3700 -1 Caught signal (Aborted)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: in thread 7f14a74f3700 thread_name:ms_dispatch
Oct 05 18:42:57 virthost2 ceph-mon[115602]: ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f14b0190140]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2: gsignal()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 3: abort()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 4: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0x9a7ec) [0x7f14b00437ec]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 5: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa5966) [0x7f14b004e966]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 6: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa59d1) [0x7f14b004e9d1]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 7: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa5c65) [0x7f14b004ec65]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 8: /usr/lib/ceph/libceph-common.so.2(+0x28982a) [0x7f14b06e682a]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 9: (MDSMonitor::tick()+0x475) [0x55a3d8709015]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 10: (MDSMonitor::on_active()+0x28) [0x55a3d86ef068]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 11: (Context::complete(int)+0x9) [0x55a3d850fc29]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 12: (void finish_contexts<std::__cxx11::list<Context, std::allocator<Context*> > >(ceph::common::CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0xa8) [0x55a3d853b458]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 13: (Paxos::finish_round()+0x70) [0x55a3d8623100]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 14: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x3d3) [0x55a3d8624ef3]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x116b) [0x55a3d850d7eb]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 16: (Monitor::_ms_dispatch(Message*)+0x41e) [0x55a3d850de2e]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 17: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x59) [0x55a3d853c9d9]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x468) [0x7f14b08d5eb8]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 19: (DispatchQueue::entry()+0x5ef) [0x7f14b08d35bf]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f14b0990cbd]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f14b0184ea7]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 22: clone()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 0> 2021-10-05T18:42:57.045+0200 7f14a74f3700 -1 Caught signal (Aborted)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: in thread 7f14a74f3700 thread_name:ms_dispatch
Oct 05 18:42:57 virthost2 ceph-mon[115602]: ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f14b0190140]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2: gsignal()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 3: abort()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 4: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0x9a7ec) [0x7f14b00437ec]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 5: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa5966) [0x7f14b004e966]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 6: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa59d1) [0x7f14b004e9d1]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 7: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa5c65) [0x7f14b004ec65]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 8: /usr/lib/ceph/libceph-common.so.2(+0x28982a) [0x7f14b06e682a]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 9: (MDSMonitor::tick()+0x475) [0x55a3d8709015]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 10: (MDSMonitor::on_active()+0x28) [0x55a3d86ef068]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 11: (Context::complete(int)+0x9) [0x55a3d850fc29]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 12: (void finish_contexts<std::__cxx11::list<Context, std::allocator<Context*> > >(ceph::common::CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0xa8) [0x55a3d853b458]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 13: (Paxos::finish_round()+0x70) [0x55a3d8623100]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 14: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x3d3) [0x55a3d8624ef3]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x116b) [0x55a3d850d7eb]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 16: (Monitor::_ms_dispatch(Message*)+0x41e) [0x55a3d850de2e]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 17: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x59) [0x55a3d853c9d9]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x468) [0x7f14b08d5eb8]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 19: (DispatchQueue::entry()+0x5ef) [0x7f14b08d35bf]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f14b0990cbd]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f14b0184ea7]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 22: clone()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 0> 2021-10-05T18:42:57.045+0200 7f14a74f3700 -1 Caught signal (Aborted) *
Oct 05 18:42:57 virthost2 ceph-mon[115602]: in thread 7f14a74f3700 thread_name:ms_dispatch
Oct 05 18:42:57 virthost2 ceph-mon[115602]: ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f14b0190140]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2: gsignal()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 3: abort()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 4: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0x9a7ec) [0x7f14b00437ec]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 5: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa5966) [0x7f14b004e966]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 6: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0xa59d1) [0x7f14b004e9d1]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 7: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(+0xa5c65) [0x7f14b004ec65]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 8: /usr/lib/ceph/libceph-common.so.2(+0x28982a) [0x7f14b06e682a]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 9: (MDSMonitor::tick()+0x475) [0x55a3d8709015]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 10: (MDSMonitor::on_active()+0x28) [0x55a3d86ef068]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 11: (Context::complete(int)+0x9) [0x55a3d850fc29]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 12: (void finish_contexts<std::__cxx11::list<Context*, std::allocator<Context*> > >(ceph::common::CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0xa8) [0x55a3d853b458]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 13: (Paxos::finish_round()+0x70) [0x55a3d8623100]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 14: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x3d3) [0x55a3d8624ef3]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x116b) [0x55a3d850d7eb]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 16: (Monitor::_ms_dispatch(Message*)+0x41e) [0x55a3d850de2e]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 17: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x59) [0x55a3d853c9d9]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x468) [0x7f14b08d5eb8]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 19: (DispatchQueue::entry()+0x5ef) [0x7f14b08d35bf]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f14b0990cbd]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f14b0184ea7]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 22: clone()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Daniel Keller over 2 years ago
I was able to complete the upgrade by switching to version 16.2.5.
After that I tried to upgrade from version 16.2.5 to version 16.2.6 but as soon as I restart the second monitor they crash again with the same error message.
after I switched back to version 16.2.5 it works again
Updated by Patrick Donnelly over 2 years ago
- Status changed from New to In Progress
- Target version set to v17.0.0
- Source set to Community (user)
- Backport set to pacific
- Component(FS) MDSMonitor added
- Labels (FS) crash added
Updated by Patrick Donnelly over 2 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 43506
- ceph-qa-suite deleted (
upgrade/nautilus-x)
Updated by dongdong tao over 2 years ago
Do we know why it can succeed on 16.2.5 but failed on 16.2.6?
Updated by Patrick Donnelly over 2 years ago
dongdong tao wrote:
Do we know why it can succeed on 16.2.5 but failed on 16.2.6?
The code in MDSMonitor::tick is trying to flush out old versions of the MDSMap so it tries loading old epochs of the FSMap. The v16.2.5 code did not do this so it wouldn't crash or assert.
Updated by dongdong tao over 2 years ago
Patrick Donnelly wrote:
dongdong tao wrote:
Do we know why it can succeed on 16.2.5 but failed on 16.2.6?
The code in MDSMonitor::tick is trying to flush out old versions of the MDSMap so it tries loading old epochs of the FSMap. The v16.2.5 code did not do this so it wouldn't crash or assert.
Got it, thank you, Patrick
Updated by Patrick Donnelly over 2 years ago
- Status changed from Fix Under Review to Pending Backport
- Priority changed from Normal to Urgent
Updated by Backport Bot over 2 years ago
- Copied to Backport #52999: pacific: Ceph monitor crash after upgrade from ceph 15.2.14 to 16.2.6 added
Updated by Loïc Dachary over 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".