Bug #56565: Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap - RADOS - Ceph

Actions

Copy link

Bug #56565

closed

Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap

Added by Mykola Golub almost 2 years ago. Updated almost 2 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I have no idea if this needs to be fixed but at least the case looks worth reporting.

We faced the issue when upgrading the cluster from nautilus 14.2.22 to pacific 16.2.8.

After the leader mon had been upgraded, a mds server was accidentally stopped, and this caused the non-upgraded mons to crash on handling the fsmap update request:

2022-04-25 14:55:03.924 7f200400f700 -1 /home/abuild/rpmbuild/BUILD/ceph-14.2.22-445-ga68959d39a6/src/mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7f200400f700 time 2022-04-
25 14:55:03.923549
/home/abuild/rpmbuild/BUILD/ceph-14.2.22-445-ga68959d39a6/src/mds/FSMap.cc: 755: FAILED ceph_assert(fs->mds_map.compat.compare(compat) == 0)

 ceph version 14.2.22-445-ga68959d39a6 (a68959d39a67faec1a7ace55e8c4327accc4a38c) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f20126efdb6]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f20126eff91]
 3: (FSMap::sanity() const+0xe0) [0x7f2012c0de20]
 4: (MDSMonitor::update_from_paxos(bool*)+0x488) [0x5633f9880a98]
 5: (PaxosService::refresh(bool*)+0x25a) [0x5633f97bd83a]
 6: (Monitor::refresh_from_paxos(bool*)+0x10c) [0x5633f969ceac]
 7: (Paxos::do_refresh()+0x4f) [0x5633f97acb8f]
 8: (Paxos::handle_commit(boost::intrusive_ptr<MonOpRequest>)+0x132) [0x5633f97b21b2]
 9: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x2db) [0x5633f97b7ecb]
 10: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x1668) [0x5633f96d00b8]
 11: (Monitor::_ms_dispatch(Message*)+0xa3a) [0x5633f96d0b5a]
 12: (Monitor::ms_dispatch(Message*)+0x26) [0x5633f9701646]
 13: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26) [0x5633f96fe0b6]
 14: (DispatchQueue::entry()+0x1279) [0x7f201291d379]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f20129cda5d]
 16: (()+0x8539) [0x7f201154c539]
 17: (clone()+0x3f) [0x7f201071ccff]

It was rather unpleasant in our case because the upgraded mon was not able to make quorum and the cluster was inaccessible until the nautilus mons were upgraded manually.

Actions

Copy link

Updated by Mykola Golub almost 2 years ago

Status changed from New to Won't Fix

I was just told there is a step in the upgrade documentation to set mon_mds_skip_sanity param before upgrade [1], which looks like to workarund this issue. So I am closing this ticket.

[1] https://docs.ceph.com/en/pacific/releases/pacific/#upgrading-non-cephadm-clusters

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #56565

Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap

Updated by Mykola Golub almost 2 years ago