Bug #10382: mds/MDS.cc: In function 'void MDS::heartbeat_reset() - CephFS - Ceph

Actions

Copy link

Bug #10382

closed

mds/MDS.cc: In function 'void MDS::heartbeat_reset()

Added by Wido den Hollander over 9 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

John Spray

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

While running a Active/Standby set of MDSes I see this happen quite often when stopping the Active MDS:

     0> 2014-12-18 17:49:16.663686 7f62b03ce700 -1 mds/MDS.cc: In function 'void MDS::heartbeat_reset()' thread 7f62b03ce700 time 2014-12-18 17:49:16.660035
mds/MDS.cc: 2694: FAILED assert(hb != __null)

 ceph version 0.89 (68fdc0f68e6a04e283d2c5140832a3175b4f9840)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x91d80b]
 2: /usr/bin/ceph-mds() [0x58f602]
 3: (MDS::ms_dispatch(Message*)+0x2d) [0x5a72dd]
 4: (DispatchQueue::entry()+0x649) [0x9fb589]
 5: (DispatchQueue::DispatchThread::entry()+0xd) [0x90751d]
 6: (()+0x8182) [0x7f62b594d182]
 7: (clone()+0x6d) [0x7f62b40bcefd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Looking at the code I see this:

void MDS::heartbeat_reset()
{
assert(hb != NULL);
// NB not enabling suicide grace, because the mon takes care of killing us
// (by blacklisting us) when we fail to send beacons, and it's simpler to
// only have one way of dying.
cct->get_heartbeat_map()->reset_timeout(hb, g_conf->mds_beacon_grace, 0);
}

The comment says the monitor should blacklist the MDS.

In this case the whole cluster is running v0.87, but only the MDS is running v0.89. Could that be the issue?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #10382

mds/MDS.cc: In function 'void MDS::heartbeat_reset()

Updated by John Spray over 9 years ago

Updated by Wido den Hollander over 9 years ago

Updated by Samuel Just over 9 years ago

Updated by John Spray over 9 years ago

Updated by John Spray over 9 years ago

Updated by Greg Farnum over 9 years ago

Updated by John Spray over 9 years ago

Updated by Greg Farnum over 9 years ago