Project

General

Profile

Actions

Bug #52820

closed

Ceph monitor crash after upgrade from ceph 15.2.14 to 16.2.6

Added by Daniel Keller over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

i tried to upgrade my ceph cluster from 15.2.14 to 16.2.6 on my proxmox 7.0 severs.

after updating the packages i restarted the first monitor which worked but the second and third monitor crash on startup. The cluster is somehow still online and working although only one of three mon is online

Oct 05 18:42:43 virthost2 systemd[1]: Started Ceph cluster monitor daemon.
Oct 05 18:42:57 virthost2 ceph-mon[115602]: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::malformed_input'
Oct 05 18:42:57 virthost2 ceph-mon[115602]: what(): void FSMap::decode(ceph::buffer::v15_2_0::list::const_iterator&) no longer understand old encoding version v < 7: Malformed input
Oct 05 18:42:57 virthost2 ceph-mon[115602]: * Caught signal (Aborted) *
Oct 05 18:42:57 virthost2 ceph-mon[115602]: in thread 7f14a74f3700 thread_name:ms_dispatch
Oct 05 18:42:57 virthost2 ceph-mon[115602]: ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(0x14140) [0x7f14b0190140]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2: gsignal()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 3: abort()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 4: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0x9a7ec) [0x7f14b00437ec]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 5: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa5966) [0x7f14b004e966]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 6: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa59d1) [0x7f14b004e9d1]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 7: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa5c65) [0x7f14b004ec65]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 8: /usr/lib/ceph/libceph-common.so.2(+0x28982a) [0x7f14b06e682a]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 9: (MDSMonitor::tick()+0x475) [0x55a3d8709015]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 10: (MDSMonitor::on_active()+0x28) [0x55a3d86ef068]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 11: (Context::complete(int)+0x9) [0x55a3d850fc29]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 12: (void finish_contexts&lt;std::__cxx11::list&lt;Context
, std::allocator&lt;Context*&gt; > >(ceph::common::CephContext*, std::__cxx11::list&lt;Context*, std::allocator&lt;Context*&gt; >&, int)+0xa8) [0x55a3d853b458]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 13: (Paxos::finish_round()+0x70) [0x55a3d8623100]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 14: (Paxos::dispatch(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x3d3) [0x55a3d8624ef3]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 15: (Monitor::dispatch_op(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x116b) [0x55a3d850d7eb]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 16: (Monitor::_ms_dispatch(Message*)+0x41e) [0x55a3d850de2e]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 17: (Dispatcher::ms_dispatch2(boost::intrusive_ptr&lt;Message&gt; const&)+0x59) [0x55a3d853c9d9]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr&lt;Message&gt; const&)+0x468) [0x7f14b08d5eb8]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 19: (DispatchQueue::entry()+0x5ef) [0x7f14b08d35bf]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f14b0990cbd]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f14b0184ea7]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 22: clone()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2021-10-05T18:42:57.045+0200 7f14a74f3700 -1
Caught signal (Aborted)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: in thread 7f14a74f3700 thread_name:ms_dispatch
Oct 05 18:42:57 virthost2 ceph-mon[115602]: ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f14b0190140]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2: gsignal()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 3: abort()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 4: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0x9a7ec) [0x7f14b00437ec]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 5: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa5966) [0x7f14b004e966]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 6: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa59d1) [0x7f14b004e9d1]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 7: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa5c65) [0x7f14b004ec65]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 8: /usr/lib/ceph/libceph-common.so.2(+0x28982a) [0x7f14b06e682a]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 9: (MDSMonitor::tick()+0x475) [0x55a3d8709015]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 10: (MDSMonitor::on_active()+0x28) [0x55a3d86ef068]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 11: (Context::complete(int)+0x9) [0x55a3d850fc29]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 12: (void finish_contexts&lt;std::__cxx11::list&lt;Context
, std::allocator&lt;Context*&gt; > >(ceph::common::CephContext*, std::__cxx11::list&lt;Context*, std::allocator&lt;Context*&gt; >&, int)+0xa8) [0x55a3d853b458]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 13: (Paxos::finish_round()+0x70) [0x55a3d8623100]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 14: (Paxos::dispatch(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x3d3) [0x55a3d8624ef3]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 15: (Monitor::dispatch_op(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x116b) [0x55a3d850d7eb]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 16: (Monitor::_ms_dispatch(Message*)+0x41e) [0x55a3d850de2e]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 17: (Dispatcher::ms_dispatch2(boost::intrusive_ptr&lt;Message&gt; const&)+0x59) [0x55a3d853c9d9]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr&lt;Message&gt; const&)+0x468) [0x7f14b08d5eb8]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 19: (DispatchQueue::entry()+0x5ef) [0x7f14b08d35bf]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f14b0990cbd]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f14b0184ea7]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 22: clone()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 0> 2021-10-05T18:42:57.045+0200 7f14a74f3700 -1
Caught signal (Aborted)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: in thread 7f14a74f3700 thread_name:ms_dispatch
Oct 05 18:42:57 virthost2 ceph-mon[115602]: ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f14b0190140]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2: gsignal()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 3: abort()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 4: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0x9a7ec) [0x7f14b00437ec]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 5: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa5966) [0x7f14b004e966]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 6: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa59d1) [0x7f14b004e9d1]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 7: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa5c65) [0x7f14b004ec65]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 8: /usr/lib/ceph/libceph-common.so.2(+0x28982a) [0x7f14b06e682a]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 9: (MDSMonitor::tick()+0x475) [0x55a3d8709015]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 10: (MDSMonitor::on_active()+0x28) [0x55a3d86ef068]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 11: (Context::complete(int)+0x9) [0x55a3d850fc29]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 12: (void finish_contexts&lt;std::__cxx11::list&lt;Context
, std::allocator&lt;Context*&gt; > >(ceph::common::CephContext*, std::__cxx11::list&lt;Context*, std::allocator&lt;Context*&gt; >&, int)+0xa8) [0x55a3d853b458]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 13: (Paxos::finish_round()+0x70) [0x55a3d8623100]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 14: (Paxos::dispatch(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x3d3) [0x55a3d8624ef3]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 15: (Monitor::dispatch_op(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x116b) [0x55a3d850d7eb]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 16: (Monitor::_ms_dispatch(Message*)+0x41e) [0x55a3d850de2e]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 17: (Dispatcher::ms_dispatch2(boost::intrusive_ptr&lt;Message&gt; const&)+0x59) [0x55a3d853c9d9]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr&lt;Message&gt; const&)+0x468) [0x7f14b08d5eb8]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 19: (DispatchQueue::entry()+0x5ef) [0x7f14b08d35bf]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f14b0990cbd]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f14b0184ea7]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 22: clone()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 0> 2021-10-05T18:42:57.045+0200 7f14a74f3700 -1
Caught signal (Aborted) *
Oct 05 18:42:57 virthost2 ceph-mon[115602]: in thread 7f14a74f3700 thread_name:ms_dispatch
Oct 05 18:42:57 virthost2 ceph-mon[115602]: ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f14b0190140]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 2: gsignal()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 3: abort()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 4: /usr/lib/x86_64-linux-gnu/libstdc+.so.6(0x9a7ec) [0x7f14b00437ec]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 5: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa5966) [0x7f14b004e966]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 6: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(0xa59d1) [0x7f14b004e9d1]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 7: /usr/lib/x86_64-linux-gnu/libstdc
+.so.6(+0xa5c65) [0x7f14b004ec65]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 8: /usr/lib/ceph/libceph-common.so.2(+0x28982a) [0x7f14b06e682a]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 9: (MDSMonitor::tick()+0x475) [0x55a3d8709015]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 10: (MDSMonitor::on_active()+0x28) [0x55a3d86ef068]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 11: (Context::complete(int)+0x9) [0x55a3d850fc29]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 12: (void finish_contexts&lt;std::__cxx11::list&lt;Context*, std::allocator&lt;Context*&gt; > >(ceph::common::CephContext*, std::__cxx11::list&lt;Context*, std::allocator&lt;Context*&gt; >&, int)+0xa8) [0x55a3d853b458]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 13: (Paxos::finish_round()+0x70) [0x55a3d8623100]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 14: (Paxos::dispatch(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x3d3) [0x55a3d8624ef3]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 15: (Monitor::dispatch_op(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x116b) [0x55a3d850d7eb]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 16: (Monitor::_ms_dispatch(Message*)+0x41e) [0x55a3d850de2e]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 17: (Dispatcher::ms_dispatch2(boost::intrusive_ptr&lt;Message&gt; const&)+0x59) [0x55a3d853c9d9]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr&lt;Message&gt; const&)+0x468) [0x7f14b08d5eb8]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 19: (DispatchQueue::entry()+0x5ef) [0x7f14b08d35bf]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 20: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f14b0990cbd]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f14b0184ea7]
Oct 05 18:42:57 virthost2 ceph-mon[115602]: 22: clone()
Oct 05 18:42:57 virthost2 ceph-mon[115602]: NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #52999: pacific: Ceph monitor crash after upgrade from ceph 15.2.14 to 16.2.6ResolvedPatrick DonnellyActions
Actions #1

Updated by Daniel Keller over 2 years ago

I was able to complete the upgrade by switching to version 16.2.5.

After that I tried to upgrade from version 16.2.5 to version 16.2.6 but as soon as I restart the second monitor they crash again with the same error message.

after I switched back to version 16.2.5 it works again

Actions #2

Updated by Neha Ojha over 2 years ago

  • Project changed from RADOS to CephFS
Actions #3

Updated by Venky Shankar over 2 years ago

  • Assignee set to Patrick Donnelly
Actions #4

Updated by Patrick Donnelly over 2 years ago

  • Status changed from New to In Progress
  • Target version set to v17.0.0
  • Source set to Community (user)
  • Backport set to pacific
  • Component(FS) MDSMonitor added
  • Labels (FS) crash added
Actions #5

Updated by Patrick Donnelly over 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 43506
  • ceph-qa-suite deleted (upgrade/nautilus-x)
Actions #6

Updated by dongdong tao over 2 years ago

Do we know why it can succeed on 16.2.5 but failed on 16.2.6?

Actions #7

Updated by Patrick Donnelly over 2 years ago

dongdong tao wrote:

Do we know why it can succeed on 16.2.5 but failed on 16.2.6?

The code in MDSMonitor::tick is trying to flush out old versions of the MDSMap so it tries loading old epochs of the FSMap. The v16.2.5 code did not do this so it wouldn't crash or assert.

Actions #8

Updated by dongdong tao over 2 years ago

Patrick Donnelly wrote:

dongdong tao wrote:

Do we know why it can succeed on 16.2.5 but failed on 16.2.6?

The code in MDSMonitor::tick is trying to flush out old versions of the MDSMap so it tries loading old epochs of the FSMap. The v16.2.5 code did not do this so it wouldn't crash or assert.

Got it, thank you, Patrick

Actions #9

Updated by Patrick Donnelly over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Priority changed from Normal to Urgent
Actions #10

Updated by Backport Bot over 2 years ago

  • Copied to Backport #52999: pacific: Ceph monitor crash after upgrade from ceph 15.2.14 to 16.2.6 added
Actions #11

Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF