Project

General

Profile

Actions

Bug #21300

closed

"ceph osd df" crashes ceph-mon if mgr is offline

Added by Artemy Kapitula over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph-mon crashes when calling "ceph osd df" if no ceph-mgr is running:

сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 2017-09-07 14:32:28.543543 7fe69f24b700 0 mon.dpr-2a1713-063-crd@0(leader) e1 handle_command mon_command({"prefix": "df", "format": "json"} v
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 2017-09-07 14:32:28.543583 7fe69f24b700 0 log_channel(audit) log [DBG] : from='client.? 10.118.63.11:0/33089080' entity='client.admin' cmd=[{
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 2017-09-07 14:32:28.547544 7fe69f24b700 0 mon.dpr-2a1713-063-crd@0(leader) e1 handle_command mon_command({"prefix": "osd df", "format": "json
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 2017-09-07 14:32:28.547574 7fe69f24b700 0 log_channel(audit) log [DBG] : from='client.? 10.118.63.11:0/4177904687' entity='client.admin' cmd=
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: ** Caught signal (Aborted) *
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: in thread 7fe69f24b700 thread_name:ms_dispatch
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 1: (()+0x8c6e21) [0x56080d4f7e21]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 2: (()+0xf130) [0x7fe6a6ad1130]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 3: (gsignal()+0x37) [0x7fe6a50715d7]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 4: (abort()+0x148) [0x7fe6a5072cc8]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 5: (()+0x4427b9) [0x56080d0737b9]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 6: (print_osd_utilization(OSDMap const&, PGStatService const*, std::ostream&, ceph::Formatter*, bool)+0x1a7) [0x56080d2eb287]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 7: (OSDMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0x105d) [0x56080d148ddd]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 8: (OSDMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x3d6) [0x56080d150e86]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 9: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x811) [0x56080d1014a1]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 10: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1cb1) [0x56080cfdfff1]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 11: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x8b9) [0x56080cfe5789]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 12: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x56080cfe6a3b]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 13: (Monitor::ms_dispatch(Message*)+0x23) [0x56080d011263]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 14: (DispatchQueue::entry()+0x792) [0x56080d4b3102]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x56080d2b64cd]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 16: (()+0x7df5) [0x7fe6a6ac9df5]
сен 07 14:32:28 dpr-2a1713-063-crd ceph-custom-daemon1839: 17: (clone()+0x6d) [0x7fe6a51321ad]


Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #22079: luminous: "ceph osd df" crashes ceph-mon if mgr is offlineResolvedJoao Eduardo LuisActions
Actions #1

Updated by huanwen ren over 6 years ago

@Artemy Kapitula
This PR: https://github.com/ceph/ceph/pull/17322 can reslove it,
but not merge into the Luminous version

Actions #2

Updated by Kefu Chai over 6 years ago

  • Subject changed from ceph-mon crash to "ceph osd df" crashes ceph-mon if mgr is offline
  • Status changed from New to Need More Info
  • Target version deleted (v12.2.0)

i am not able to reproduce this issue with a vstart cluster with ddf84249fa8a8ec3655c39bac5331ab81c0307b1.

  1. start vstart cluster with 1 mgr
  2. stop this mgr
  3. "ceph osd df" hangs

or

  1. start vstart cluster without mgr
  2. "ceph osd df" hangs
Actions #3

Updated by Joao Eduardo Luis over 6 years ago

kefu, were you able to reproduce the issue on 12.2.0?

Actions #4

Updated by huanwen ren over 6 years ago

Follow the steps below to reproduce,only one Mon and one Mgr.
(multiple mon and Mgr will have the same problem, using one to simplify reproduction):
1. start Mon and Mgr
2. stop Mgr
3. stop Mon
4. start Mon
5. use command "ceph --verbose osd df", you can get info

        bestcmds_sorted: 
      [{u'cmd019': {u'avail': u'cli,rest',
              u'flags': 0,
              u'help': u'show OSD utilization',
              u'module': u'osd',
              u'perm': u'r',
              u'sig': [argdesc(<class 'ceph_argparse.CephPrefix'>, name=prefix, req=True, n=1, prefix=osd),
                       argdesc(<class 'ceph_argparse.CephPrefix'>, name=prefix, req=True, n=1, prefix=df),
                       argdesc(<class 'ceph_argparse.CephChoices'>, name=output_method, req=False, n=1,      
      strings=plain|tree)]}}]
      Submitting command:  {'prefix': u'osd df'}

6. Mon is coredump

Actions #5

Updated by Joao Eduardo Luis over 6 years ago

  • Assignee set to Joao Eduardo Luis

I'll look into this.

Actions #6

Updated by Kefu Chai over 6 years ago

  • Status changed from Need More Info to New

Joao, i have not tested it on 12.2.0.

Actions #7

Updated by Joao Eduardo Luis over 6 years ago

  • Status changed from New to 12
  • Priority changed from Normal to High
  • Severity changed from 3 - minor to 2 - major

okay, managed to reproduce this somehow, on v12.2.0.

The only obvious difference between when it reproduced vs all those times it didn't lies in the ceph tool sending the command to the monitor when it does reproduce; if the issue is not triggered, it's because the ceph tool blocks trying to reach the mgr directly.

I suspect this will happen because the only time it triggered, the client was trying to contact the monitor directly because the mgr had been marked down at that point? Not sure, but looking into it now.

Actions #8

Updated by Joao Eduardo Luis over 6 years ago

This is reproducible in last night's luminous.

Actions #9

Updated by Joao Eduardo Luis over 6 years ago

I did figure out how to fix the crash on luminous, but the underlying issue is nevertheless present in master as well.

In master, even though the command does not crash the monitor, it does have the side-effect of returning EINVAL (because the monitor has no idea what this command is):

better match: 2 > 1: cmd019:osd df {plain|tree} 
bestcmds_sorted: 
[{u'cmd019': {u'avail': u'cli,rest',
              u'flags': 0,
              u'help': u'show OSD utilization',
              u'module': u'osd',
              u'perm': u'r',
              u'sig': [argdesc(<class 'ceph_argparse.CephPrefix'>, name=prefix, req=True, n=1, prefix=osd),
                       argdesc(<class 'ceph_argparse.CephPrefix'>, name=prefix, req=True, n=1, prefix=df),
                       argdesc(<class 'ceph_argparse.CephChoices'>, name=output_method, req=False, n=1, strings=plain|tree)]}}]
Submitting command:  {'prefix': u'osd df'}
2017-09-20 17:22:05.569963 7f54f913f700 10 monclient: _send_command 2 [{"prefix": "osd df"}]
2017-09-20 17:22:05.569973 7f54f913f700 10 monclient: _send_mon_message to mon.a at 127.0.0.1:40000/0
2017-09-20 17:22:05.570807 7f54f086e700 10 monclient: handle_mon_command_ack 2 [{"prefix": "osd df"}]
2017-09-20 17:22:05.570812 7f54f086e700 10 monclient: _finish_command 2 = -22 (22) Invalid argument
Error EINVAL: (22) Invalid argument
2017-09-20 17:22:05.572510 7f54f913f700 10 monclient: shutdown
Actions #10

Updated by Joao Eduardo Luis over 6 years ago

  • Status changed from 12 to Fix Under Review
  • Backport set to luminous
  • Release set to master
Actions #11

Updated by Joao Eduardo Luis over 6 years ago

merged part of the fix on master; backport of said fix and second patch required in luminous can be found in https://github.com/ceph/ceph/pull/18038

Actions #12

Updated by Nathan Cutler over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #13

Updated by Joao Eduardo Luis over 6 years ago

Nathan, backport has been merged as part of https://github.com/ceph/ceph/pull/18038 - is there anything else to be done?

Actions #14

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (luminous)
Actions #15

Updated by Nathan Cutler over 6 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to luminous
Actions #16

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #22079: luminous: "ceph osd df" crashes ceph-mon if mgr is offline added
Actions #17

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF