Project

General

Profile

Bug #22041

'ceph osd df tree' crashes new mons

Added by Paul Emmerich over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Probably the same root cause as #21770 because it appears under the same circumstances.

I'm seeing the following crash when running "ceph osd df tree":

Nov 04 16:46:43 new-croit-host-BEEF03 ceph-mon[11133]: 2017-11-04 17:46:43.948275 7fc18c807700 -1 *** Caught signal (Aborted) **
                                                        in thread 7fc18c807700 thread_name:ms_dispatch

                                                        ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
                                                        1: (()+0x930b84) [0x5618c8d9eb84]
                                                        2: (()+0x110c0) [0x7fc195f480c0]
                                                        3: (gsignal()+0xcf) [0x7fc19336efcf]
                                                        4: (abort()+0x16a) [0x7fc1933703fa]
                                                        5: (()+0x40a2a9) [0x5618c88782a9]
                                                        6: (print_osd_utilization(OSDMap const&, PGStatService const*, std::ostream&, ceph::Formatter*, bool)+0x1a9) [0x5618c8b38c49]
                                                        7: (OSDMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0xb57) [0x5618c8965337]
                                                        8: (OSDMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x2c0) [0x5618c896f320]
                                                        9: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x7f8) [0x5618c8916468]
                                                        10: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x233b) [0x5618c87d9d1b]
                                                        11: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xa49) [0x5618c87e1139]
                                                        12: (Monitor::_ms_dispatch(Message*)+0x6d3) [0x5618c87e21c3]
                                                        13: (Monitor::ms_dispatch(Message*)+0x23) [0x5618c880f963]
                                                        14: (DispatchQueue::entry()+0xeda) [0x5618c8d459aa]
                                                        15: (DispatchQueue::DispatchThread::entry()+0xd) [0x5618c8af059d]
                                                        16: (()+0x7494) [0x7fc195f3e494]
                                                        17: (clone()+0x3f) [0x7fc193424aff]
                                                        NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

it happens under the exact same circumstances as #21770, i.e., only on new mons in new clusters and restarting all existing mons and mgrs at the same time fixes it.

Steps to reproduce:

  1. create a new cluster with one mon and one mgr, do not restart them!
  2. create a few OSDs and at least one pool (some IO operations on the pool might be neccessary, not sure)
  3. create a new mon/mgr
  4. run ceph osd df tree a few times
  5. the new mon crashes, but never the initial mon

History

#1 Updated by Paul Emmerich over 6 years ago

this is fixed in 12.2.2, thanks!

#2 Updated by Nathan Cutler over 6 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF