Project

General

Profile

Actions

Bug #58156

closed

Monitors do not permit OSD to join after upgrading to Quincy

Added by Igor Fedotov over 1 year ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The Nautilus cluster has been eventually upgraded to Quincy and at the end OSDs stopped joining the cluster.

The investigation revealed that monitor doesn't permit OSD joining with the following error:
mon.ceph1-1 (mon.2) 147 : cluster [INF] disallowing boot of quincy+ OSD osd.16 v2:xx.xx.xx.xx:xxx/xxxx because require_osd_release < octopus

and
the current require-osd-release is still at nautilus - user missed the relevant step to tune it up during the upgrade...

An attempt to tune require-osd-release up to resolve the issue causes the following exception at monitors:
2022-12-01T08:00:38.849-0500 7f6517e7c700 -1 /build/ceph-17.2.5/src/mon/OSDMonitor.cc: In function 'bool OSDMonitor::prepare_command_impl(MonOpRequestRef, const
cmdmap_t&)' thread 7f6517e7c700 time 2022-12-01T08:00:38.838800-0500
/build/ceph-17.2.5/src/mon/OSDMonitor.cc: 11618: FAILED ceph_assert(osdmap.require_osd_release >= ceph_release_t::octopus)

ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7f65200387c6]
2: /usr/lib/ceph/libceph-common.so.2(+0x27c9d8) [0x7f65200389d8]
3: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr&lt;MonOpRequest&gt;, std::map&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt;
>, boost::variant&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; >, bool, long, double, std::vector&lt;std::__cxx11::basic_string&lt;c
har, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; >, std::allocator&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > > >, std::ve
ctor&lt;long, std::allocator&lt;long&gt; >, std::vector&lt;double, std::allocator&lt;double&gt; > >, std::less&lt;void&gt;, std::allocator&lt;std::pair&lt;std::__cxx11::basic_string&lt;char, st
d::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > const, boost::variant&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; >, bool, long,
double, std::vector&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; >, std::allocator&lt;std::__cxx11::basic_string&lt;char, std::char_t
raits&lt;char&gt;, std::allocator&lt;char&gt; > > >, std::vector&lt;long, std::allocator&lt;long&gt; >, std::vector&lt;double, std::allocator&lt;double&gt; > > > > > const&)+0xcb03) [0x55562
43df333]
4: (OSDMonitor::prepare_command(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x45f) [0x5556243f03af]
5: (OSDMonitor::prepare_update(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x162) [0x5556243ff552]
...

Neither downgrading monitor nor OSDs helped - downgraded entities were unable to startup due to data layout changes done by Octopus..

Hence getting a sort of deadlock which finally has been resolved by using custom Quincy build which omits the above assertion at OSDMonitor.cc

Evidently this was primarily a user fault to not upgrade required-osd-release but Ceph definitely needs more friendly means to resolve/avoid the issue.


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #59455: pacific: Monitors do not permit OSD to join after upgrading to QuincyResolvedIgor FedotovActions
Copied to RADOS - Backport #59456: quincy: Monitors do not permit OSD to join after upgrading to QuincyResolvedIgor FedotovActions
Actions #1

Updated by Igor Fedotov over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to Igor Fedotov
  • Backport set to quincy, pacific
Actions #2

Updated by Igor Fedotov over 1 year ago

  • Pull request ID set to 49199
Actions #3

Updated by Radoslaw Zarzynski over 1 year ago

Hi Igor! What was the intermediary version during the upgrade? We merged https://github.com/ceph/ceph/pull/44090 but not sure it's presence in very old versions.

Actions #4

Updated by Igor Fedotov over 1 year ago

Radoslaw Zarzynski wrote:

Hi Igor! What was the intermediary version during the upgrade? We merged https://github.com/ceph/ceph/pull/44090 but not sure it's presence in very old versions.

Hi Radek,
unfortunately it's hard to say at the moment. Highly likely they missed the patch or the warning itself (not sure whether it's possible to proceed with an upgrade when it's raised...)

Anyway my patch attempts to solve that from a different perspective - enabling raising require_osd_release instead of assertion and resulting deadlock..

Actions #5

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from In Progress to Pending Backport
  • Target version set to v18.0.0
  • Source set to Development
Actions #6

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59455: pacific: Monitors do not permit OSD to join after upgrading to Quincy added
Actions #7

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59456: quincy: Monitors do not permit OSD to join after upgrading to Quincy added
Actions #8

Updated by Backport Bot about 1 year ago

  • Tags set to backport_processed
Actions #9

Updated by Igor Fedotov 9 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF