Project

General

Profile

Bug #720

marking an MDS that is operational as failed causes an assert

Added by Colin McCabe over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
Start date:
01/18/2011
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

How to reproduce:

1. start all cluster nodes
2. ./ceph mds fail a

Backtrace:

 mds/MDSMap.h: In function 'const MDSMap::mds_info_t& MDSMap::get_mds_info(int)', In thread 7fe40e413710
 mds/MDSMap.h:229: FAILED assert(up.count(m) && mds_info.count(up[m]))
 ceph version 0.25~rc (commit:a95fa5b94b619cd547bb17f87ef9bb0172f14bb9)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39) [0xa4e83e]
 2: (MDSMap::get_mds_info(int)+0x7c) [0x91adb4]
 3: (MDBalancer::check_targets()+0x42) [0x9172a8]
 4: (MDBalancer::try_rebalance()+0x28) [0x915d82]
 5: (MDS::handle_mds_map(MMDSMap*)+0x2338) [0x772ff8]
 6: (MDS::_dispatch(Message*)+0xa34) [0x778614]
 7: (MDS::ms_dispatch(Message*)+0x38) [0x777a32]
 8: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7507b7]
 9: (SimpleMessenger::dispatch_entry()+0x6d8) [0x73cd0e]
 10: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x732ba2]
 11: (Thread::_entry_func(void*)+0x23) [0x74f70b]
 12: (()+0x68ba) [0x7fe410a688ba]
 13: (clone()+0x6d) [0x7fe40f6fd02d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
 *** Caught signal (Aborted) ***
 in thread 7fe40e413710
 ceph version 0.25~rc (commit:a95fa5b94b619cd547bb17f87ef9bb0172f14bb9)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0xa4eb87]
 2: (handle_fatal_signal(int)+0xb2) [0xa672a7]
 3: (()+0xef60) [0x7fe410a70f60]
 4: (gsignal()+0x35) [0x7fe40f660165]
 5: (abort()+0x180) [0x7fe40f662f70]
 6: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fe40fef3dc5]
 7: (()+0xcb166) [0x7fe40fef2166]
 8: (()+0xcb193) [0x7fe40fef2193]
 9: (()+0xcb28e) [0x7fe40fef228e]
 a: (ceph::__ceph_assert_warn(char const*, char const*, int, char const*)+0) [0xa4e9d2]
 b: (MDSMap::get_mds_info(int)+0x7c) [0x91adb4]
 c: (MDBalancer::check_targets()+0x42) [0x9172a8]
 d: (MDBalancer::try_rebalance()+0x28) [0x915d82]
 e: (MDS::handle_mds_map(MMDSMap*)+0x2338) [0x772ff8]
 f: (MDS::_dispatch(Message*)+0xa34) [0x778614]
 10: (MDS::ms_dispatch(Message*)+0x38) [0x777a32]
 11: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7507b7]
 12: (SimpleMessenger::dispatch_entry()+0x6d8) [0x73cd0e]
 13: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x732ba2]
 14: (Thread::_entry_func(void*)+0x23) [0x74f70b]
 15: (()+0x68ba) [0x7fe410a688ba]
 16: (clone()+0x6d) [0x7fe40f6fd02d]

Associated revisions

Revision e276683d (diff)
Added by Sage Weil over 8 years ago

mon: fix 'ceph mds fail <N>' command

We need to remove the mds_info from the map for cmds to take notice.

Fixes: #720
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 8 years ago

  • Category set to Monitor
  • Assignee set to Sage Weil
  • Target version set to v0.24.2

#2 Updated by Sage Weil over 8 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF