Actions
Backport #14668
closedhammer: Wrong ceph get mdsmap assertion
Release:
hammer
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Updated by Dan van der Ster about 8 years ago
After a few more retries the mon's eventually started.
Updated by John Spray about 8 years ago
- Subject changed from ceph mds getmap 1234 crashes all ceph-mons to hammer: ceph mds getmap 1234 crashes all ceph-mons
This was fixed in >=infernalis, but apparently never backported.
commit f4398d2e6c245e3f81a6038425e1b8372b265b8c Author: Vicente Cheng <freeze.bilsted@gmail.com> Date: Fri Mar 27 18:49:28 2015 +0800 Fixed the ceph get mdsmap assertion. When we want to get mdsmap, we try to get_version() and the return value err = 0 means success. The assert verified r == 0. r would not change in this flow. It always meet assert and lead mon failure. I think this verify should be: assert(err == 0) It will help to check return value of get_version(). If you have any questions, feel free to let me know. Thanks! Signed-off-by: Vicente Cheng <freeze.bilsted@gmail.com> diff --git a/src/mon/MDSMonitor.cc b/src/mon/MDSMonitor.cc index 6c4d13c..c76f8b6 100644 --- a/src/mon/MDSMonitor.cc +++ b/src/mon/MDSMonitor.cc @@ -777,7 +777,7 @@ bool MDSMonitor::preprocess_command(MMonCommand *m) if (err == -ENOENT) { r = -ENOENT; } else { - assert(r == 0); + assert(err == 0); assert(b.length()); MDSMap mm; mm.decode(b);
Updated by Nathan Cutler about 8 years ago
https://github.com/ceph/ceph/pull/4203
I will stage hammer backport.
Updated by Nathan Cutler about 8 years ago
- Tracker changed from Bug to Backport
- Description updated (diff)
- Assignee set to Nathan Cutler
Original description¶
One of our hammer clusters won't start now after running ceph mds getmap.
I did:
# ceph status cluster dd535a7e-4647-4bee-853d-f34112615f81 health HEALTH_WARN mds cluster is degraded monmap e30: 3 mons at {p01001532077488=128.142.36.227:6790/0,p01001532149022=128.142.39.77:6790/0,p01001532184554=128.142.39.144:6790/0} election epoch 2170, quorum 0,1,2 p01001532077488,p01001532149022,p01001532184554 mdsmap e2493: 1/2/2 up {1=cephmdsd2=up:resolve}, 2 up:standby osdmap e96770: 217 osds: 216 up, 216 in pgmap v19240490: 18208 pgs, 37 pools, 5001 GB data, 11600 kobjects 15383 GB used, 435 TB / 450 TB avail 18206 active+clean 2 active+clean+scrubbing # ceph mds getmap 2493
and now the ceph-mon processes all crash at startup like this:
2016-02-05 09:42:30.855475 7f2c1a0e1700 0 log_channel(audit) log [DBG] : from='client.1 74542645 128.142.142.252:0/1003572' entity='client.admin' cmd=[{"prefix": "mds getmap", "epoch": 2493}]: dispatch 2016-02-05 09:42:30.858517 7f2c1a0e1700 -1 mon/MDSMonitor.cc: In function 'bool MDSMonit or::preprocess_command(MMonCommand*)' thread 7f2c1a0e1700 time 2016-02-05 09:42:30.85555 1 mon/MDSMonitor.cc: 757: FAILED assert(r == 0) ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f3b65] 2: (MDSMonitor::preprocess_command(MMonCommand*)+0xfae) [0x67251e] 3: (MDSMonitor::preprocess_query(PaxosServiceMessage*)+0x28b) [0x675d2b] 4: (PaxosService::dispatch(PaxosServiceMessage*)+0x833) [0x607433] 5: (Monitor::handle_command(MMonCommand*)+0x11f1) [0x5cdf01] 6: (Monitor::dispatch(MonSession*, Message*, bool)+0xf9) [0x5d13c9] 7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x5d2076] 8: (Monitor::ms_dispatch(Message*)+0x23) [0x5f1443] 9: (DispatchQueue::entry()+0x62a) [0x947d5a] 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7dbbad] 11: (()+0x7dc5) [0x7f2c2542cdc5] 12: (clone()+0x6d) [0x7f2c23f0e21d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Eeekk! help?
Updated by Nathan Cutler about 8 years ago
- Copied from Bug #14681: Wrong ceph get mdsmap assertion added
Updated by Nathan Cutler about 8 years ago
- Subject changed from hammer: ceph mds getmap 1234 crashes all ceph-mons to hammer: Wrong ceph get mdsmap assertion
Updated by Nathan Cutler about 8 years ago
- Description updated (diff)
- Status changed from New to In Progress
Updated by Nathan Cutler about 8 years ago
Hammer backport is staged. Maybe it's not too late to squeeze it into 0.94.6 - we'll see.
Updated by Loïc Dachary about 8 years ago
It is unfortunately too late, v0.94.6 is already frozen and being tested.
Updated by Loïc Dachary over 7 years ago
- Status changed from In Progress to Resolved
- Target version set to v0.94.8
Actions