Project

General

Profile

Actions

Bug #58156

closed

Monitors do not permit OSD to join after upgrading to Quincy

Added by Igor Fedotov over 1 year ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The Nautilus cluster has been eventually upgraded to Quincy and at the end OSDs stopped joining the cluster.

The investigation revealed that monitor doesn't permit OSD joining with the following error:
mon.ceph1-1 (mon.2) 147 : cluster [INF] disallowing boot of quincy+ OSD osd.16 v2:xx.xx.xx.xx:xxx/xxxx because require_osd_release < octopus

and
the current require-osd-release is still at nautilus - user missed the relevant step to tune it up during the upgrade...

An attempt to tune require-osd-release up to resolve the issue causes the following exception at monitors:
2022-12-01T08:00:38.849-0500 7f6517e7c700 -1 /build/ceph-17.2.5/src/mon/OSDMonitor.cc: In function 'bool OSDMonitor::prepare_command_impl(MonOpRequestRef, const
cmdmap_t&)' thread 7f6517e7c700 time 2022-12-01T08:00:38.838800-0500
/build/ceph-17.2.5/src/mon/OSDMonitor.cc: 11618: FAILED ceph_assert(osdmap.require_osd_release >= ceph_release_t::octopus)

ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7f65200387c6]
2: /usr/lib/ceph/libceph-common.so.2(+0x27c9d8) [0x7f65200389d8]
3: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr&lt;MonOpRequest&gt;, std::map&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt;
>, boost::variant&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; >, bool, long, double, std::vector&lt;std::__cxx11::basic_string&lt;c
har, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; >, std::allocator&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > > >, std::ve
ctor&lt;long, std::allocator&lt;long&gt; >, std::vector&lt;double, std::allocator&lt;double&gt; > >, std::less&lt;void&gt;, std::allocator&lt;std::pair&lt;std::__cxx11::basic_string&lt;char, st
d::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > const, boost::variant&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; >, bool, long,
double, std::vector&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; >, std::allocator&lt;std::__cxx11::basic_string&lt;char, std::char_t
raits&lt;char&gt;, std::allocator&lt;char&gt; > > >, std::vector&lt;long, std::allocator&lt;long&gt; >, std::vector&lt;double, std::allocator&lt;double&gt; > > > > > const&)+0xcb03) [0x55562
43df333]
4: (OSDMonitor::prepare_command(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x45f) [0x5556243f03af]
5: (OSDMonitor::prepare_update(boost::intrusive_ptr&lt;MonOpRequest&gt;)+0x162) [0x5556243ff552]
...

Neither downgrading monitor nor OSDs helped - downgraded entities were unable to startup due to data layout changes done by Octopus..

Hence getting a sort of deadlock which finally has been resolved by using custom Quincy build which omits the above assertion at OSDMonitor.cc

Evidently this was primarily a user fault to not upgrade required-osd-release but Ceph definitely needs more friendly means to resolve/avoid the issue.


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #59455: pacific: Monitors do not permit OSD to join after upgrading to QuincyResolvedIgor FedotovActions
Copied to RADOS - Backport #59456: quincy: Monitors do not permit OSD to join after upgrading to QuincyResolvedIgor FedotovActions
Actions

Also available in: Atom PDF