Project

General

Profile

Actions

Bug #20659

closed

MDSMonitor: assertion failure if two mds report same health warning

Added by Patrick Donnelly almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(gdb) bt
#0  0x00007f782719323b in raise () from /lib64/libpthread.so.0
#1  0x0000003226a47d36 in reraise_fatal (signum=6) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/global/signal_handler.cc:74
#2  handle_fatal_signal (signum=6) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/global/signal_handler.cc:138
#3  <signal handler called>
#4  0x00007f78244631d7 in raise () from /lib64/libc.so.6
#5  0x00007f78244648c8 in abort () from /lib64/libc.so.6
#6  0x00000032267ba274 in ceph::__ceph_assert_fail (assertion=assertion@entry=0x3226c2060a "checks.count(code) == 0", 
    file=file@entry=0x3226c20dd0 "/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.0-990-gb36c57d/rpm/el7/BUILD/ceph-12.1.0-990"..., line=line@entry=97, 
    func=func@entry=0x3226c44f80 <_ZZN18health_check_map_t3addERKSs15health_status_tS1_E19__PRETTY_FUNCTION__> "health_check_t& health_check_map_t::add(const string&, health_status_t, const string&)")
    at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/assert.cc:66
#7  0x000000322674e929 in add (summary="%num% MDSs report slow requests", severity=HEALTH_WARN, code="MDS_SLOW_REQUEST", this=0x7f7820bcb170) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/mon/health_check.h:97
#8  MDSMonitor::encode_pending (this=0x32308ec800, t=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<MonitorDBStore::Transaction*, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<MonitorDBStore::Transaction*, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 3, weak 0) 0x3230b70400) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/mon/MDSMonitor.cc:209
#9  0x00000032266b52dd in PaxosService::propose_pending (this=0x32308ec800) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/mon/PaxosService.cc:213
#10 0x000000322656b797 in operator() (a0=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/build/boost/include/boost/function/function_template.hpp:771
#11 finish (r=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/include/Context.h:493
#12 C_MonContext::finish (this=<optimized out>, r=<optimized out>) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/mon/Monitor.cc:132
#13 0x00000032265a7129 in Context::complete (this=0x32313a44f0, r=<optimized out>) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/include/Context.h:70
#14 0x00000032267b69a4 in SafeTimer::timer_thread (this=0x32308e6490) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/Timer.cc:97
#15 0x00000032267b83cd in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/common/Timer.cc:30
#16 0x00007f782718bdc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f782452573d in clone () from /lib64/libc.so.6
(gdb) frame 7            
#7  0x000000322674e929 in add (summary="%num% MDSs report slow requests", severity=HEALTH_WARN, code="MDS_SLOW_REQUEST", this=0x7f7820bcb170) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/mon/health_check.h:97
97          assert(checks.count(code) == 0);
(gdb) print checks
$1 = std::map with 1 elements = {["MDS_SLOW_REQUEST"] = {severity = HEALTH_WARN, summary = "%num% MDSs report slow requests", detail = std::list = {[0] = "mdsh(mds.8): 2 slow requests are blocked > 30 sec"}}}
(gdb) frame 8
#8  MDSMonitor::encode_pending (this=0x32308ec800, t=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<MonitorDBStore::Transaction*, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<MonitorDBStore::Transaction*, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr (count 3, weak 0) 0x3230b70400) at /usr/src/debug/ceph-12.1.0-990-gb36c57d/src/mon/MDSMonitor.cc:209
209             mds_metric_summary(metric.type));
(gdb) print rank
$2 = 5

From: /ceph/teuthology-archive/pdonnell-2017-07-14_04:41:06-multimds-wip-pdonnell-20170713-testing-basic-smithi/1399292/remote/smithi008/coredump/1500017698.78430.core
and: /ceph/teuthology-archive/pdonnell-2017-07-14_04:41:06-multimds-wip-pdonnell-20170713-testing-basic-smithi/1399292/remote/smithi008/coredump/1500017698.78430.core

Actions #1

Updated by John Spray almost 7 years ago

  • Status changed from New to Resolved

Unless this test run was more recent than the fix, I think this is https://github.com/ceph/ceph/pull/16302

Actions #2

Updated by Patrick Donnelly almost 7 years ago

No it isn't more recent. Thanks!

Actions

Also available in: Atom PDF