Project

General

Profile

Actions

Bug #21568

closed

MDSMonitor commands crashing on cluster upgraded from Hammer (nonexistent pool?)

Added by John Spray over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Opened from mailing list thread "[ceph-users] "ceph fs" commands hang forever and kill monitors"

> Relevant excerpt from logs on an affected monitor, just trying to run 'ceph fs ls':
>
> 2017-09-26 13:20:50.716087 7fc85fdd9700  0 mon.vm-ds-01@0(leader) e19 handle_command mon_command({"prefix": "fs ls"} v 0) v1
> 2017-09-26 13:20:50.727612 7fc85fdd9700  0 log_channel(audit) log [DBG] : from='client.? 10.10.10.1:0/2771553898' entity='client.admin' cmd=[{"prefix": "fs ls"}]: dispatch
> 2017-09-26 13:20:50.950373 7fc85fdd9700 -1 /build/ceph-12.2.0/src/osd/OSDMap.h: In function 'const string& OSDMap::get_pool_name(int64_t) const' thread 7fc85fdd9700 time 2017-09-26 13:20:50.727676
> /build/ceph-12.2.0/src/osd/OSDMap.h: 1176: FAILED assert(i != pool_name.end())
>
>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55a8ca0bb642]
>  2: (()+0x48165f) [0x55a8c9f4165f]
>  3: (MDSMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0x1d18) [0x55a8ca047688]
>  4: (MDSMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x2a8) [0x55a8ca048008]
>  5: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x700) [0x55a8c9f9d1b0]
>  6: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1f93) [0x55a8c9e63193]
>  7: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xa0e) [0x55a8c9e6a52e]
>  8: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55a8c9e6b57b]
>  9: (Monitor::ms_dispatch(Message*)+0x23) [0x55a8c9e9a053]
>  10: (DispatchQueue::entry()+0xf4a) [0x55a8ca3b5f7a]
>  11: (DispatchQueue::DispatchThread::entry()+0xd) [0x55a8ca16bc1d]
>  12: (()+0x76ba) [0x7fc86b3ac6ba]
>  13: (clone()+0x6d) [0x7fc869bd63dd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

If it turns out that we could be inheriting maps like this from earlier versions of Ceph, we might need something at the point that MDSMonitor loads up to clean it up.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #21953: luminous: MDSMonitor commands crashing on cluster upgraded from Hammer (nonexistent pool?)ResolvedPatrick DonnellyActions
Actions #1

Updated by John Spray over 6 years ago

  • Description updated (diff)
Actions #2

Updated by Patrick Donnelly over 6 years ago

User confirmed the MDSMap referred to data pools that no longer exist. The fix should check for non-existent pools and remove them from the map on load.

Actions #3

Updated by Patrick Donnelly over 6 years ago

  • Status changed from New to 12
  • Assignee set to Patrick Donnelly
  • Source set to Community (user)
  • Backport set to luminous
Actions #4

Updated by Patrick Donnelly over 6 years ago

  • Status changed from 12 to Fix Under Review
Actions #5

Updated by Patrick Donnelly over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21953: luminous: MDSMonitor commands crashing on cluster upgraded from Hammer (nonexistent pool?) added
Actions #7

Updated by Patrick Donnelly about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF