Project

General

Profile

Actions

Backport #14668

closed

hammer: Wrong ceph get mdsmap assertion

Added by Dan van der Ster about 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Release:
hammer
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Related issues 1 (0 open1 closed)

Copied from CephFS - Bug #14681: Wrong ceph get mdsmap assertionResolved02/05/2016

Actions
Actions #1

Updated by Dan van der Ster about 8 years ago

After a few more retries the mon's eventually started.

Actions #2

Updated by John Spray about 8 years ago

  • Subject changed from ceph mds getmap 1234 crashes all ceph-mons to hammer: ceph mds getmap 1234 crashes all ceph-mons

This was fixed in >=infernalis, but apparently never backported.

commit f4398d2e6c245e3f81a6038425e1b8372b265b8c
Author: Vicente Cheng <freeze.bilsted@gmail.com>
Date:   Fri Mar 27 18:49:28 2015 +0800

    Fixed the ceph get mdsmap assertion.

        When we want to get mdsmap, we try to get_version()
        and the return value err = 0 means success.

        The assert verified r == 0. r would not change in this flow.
        It always meet assert and lead mon failure.

        I think this verify should be:
            assert(err == 0)
        It will help to check return value of get_version().

    If you have any questions, feel free to let me know.
    Thanks!

    Signed-off-by: Vicente Cheng <freeze.bilsted@gmail.com>

diff --git a/src/mon/MDSMonitor.cc b/src/mon/MDSMonitor.cc
index 6c4d13c..c76f8b6 100644
--- a/src/mon/MDSMonitor.cc
+++ b/src/mon/MDSMonitor.cc
@@ -777,7 +777,7 @@ bool MDSMonitor::preprocess_command(MMonCommand *m)
       if (err == -ENOENT) {
        r = -ENOENT;
       } else {
-       assert(r == 0);
+       assert(err == 0);
        assert(b.length());
        MDSMap mm;
        mm.decode(b);
Actions #3

Updated by Nathan Cutler about 8 years ago

https://github.com/ceph/ceph/pull/4203

I will stage hammer backport.

Actions #4

Updated by Nathan Cutler about 8 years ago

  • Tracker changed from Bug to Backport
  • Description updated (diff)
  • Assignee set to Nathan Cutler

Original description

One of our hammer clusters won't start now after running ceph mds getmap.
I did:

# ceph status
    cluster dd535a7e-4647-4bee-853d-f34112615f81
     health HEALTH_WARN
            mds cluster is degraded
     monmap e30: 3 mons at {p01001532077488=128.142.36.227:6790/0,p01001532149022=128.142.39.77:6790/0,p01001532184554=128.142.39.144:6790/0}
            election epoch 2170, quorum 0,1,2 p01001532077488,p01001532149022,p01001532184554
     mdsmap e2493: 1/2/2 up {1=cephmdsd2=up:resolve}, 2 up:standby
     osdmap e96770: 217 osds: 216 up, 216 in
      pgmap v19240490: 18208 pgs, 37 pools, 5001 GB data, 11600 kobjects
            15383 GB used, 435 TB / 450 TB avail
               18206 active+clean
                   2 active+clean+scrubbing

# ceph mds getmap 2493

and now the ceph-mon processes all crash at startup like this:

2016-02-05 09:42:30.855475 7f2c1a0e1700  0 log_channel(audit) log [DBG] : from='client.1
74542645 128.142.142.252:0/1003572' entity='client.admin' cmd=[{"prefix": "mds getmap",
"epoch": 2493}]: dispatch
2016-02-05 09:42:30.858517 7f2c1a0e1700 -1 mon/MDSMonitor.cc: In function 'bool MDSMonit
or::preprocess_command(MMonCommand*)' thread 7f2c1a0e1700 time 2016-02-05 09:42:30.85555
1
mon/MDSMonitor.cc: 757: FAILED assert(r == 0)

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f3b65]
 2: (MDSMonitor::preprocess_command(MMonCommand*)+0xfae) [0x67251e]
 3: (MDSMonitor::preprocess_query(PaxosServiceMessage*)+0x28b) [0x675d2b]
 4: (PaxosService::dispatch(PaxosServiceMessage*)+0x833) [0x607433]
 5: (Monitor::handle_command(MMonCommand*)+0x11f1) [0x5cdf01]
 6: (Monitor::dispatch(MonSession*, Message*, bool)+0xf9) [0x5d13c9]
 7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x5d2076]
 8: (Monitor::ms_dispatch(Message*)+0x23) [0x5f1443]
 9: (DispatchQueue::entry()+0x62a) [0x947d5a]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7dbbad]
 11: (()+0x7dc5) [0x7f2c2542cdc5]
 12: (clone()+0x6d) [0x7f2c23f0e21d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Eeekk! help?

Actions #5

Updated by Nathan Cutler about 8 years ago

  • Copied from Bug #14681: Wrong ceph get mdsmap assertion added
Actions #6

Updated by Nathan Cutler about 8 years ago

  • Subject changed from hammer: ceph mds getmap 1234 crashes all ceph-mons to hammer: Wrong ceph get mdsmap assertion
Actions #7

Updated by Nathan Cutler about 8 years ago

  • Description updated (diff)
  • Status changed from New to In Progress
Actions #8

Updated by Nathan Cutler about 8 years ago

Hammer backport is staged. Maybe it's not too late to squeeze it into 0.94.6 - we'll see.

Actions #9

Updated by Loïc Dachary about 8 years ago

It is unfortunately too late, v0.94.6 is already frozen and being tested.

Actions #10

Updated by Loïc Dachary over 7 years ago

  • Status changed from In Progress to Resolved
  • Target version set to v0.94.8
Actions

Also available in: Atom PDF