Actions
Bug #17531
closedmds fails to respawn if executable has changed
% Done:
0%
Source:
Development
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
If the mds is failed via `ceph mds fail` and the executable file has changed, the mds will fail to respawn using exec:
-14> 2016-10-06 15:12:04.933154 7fd94f072700 1 mds.a handle_mds_map i (127.0.0.1:6839/2977038019) dne in the mdsmap, respawning myself -13> 2016-10-06 15:12:04.933157 7fd94f072700 1 mds.a respawn -12> 2016-10-06 15:12:04.933160 7fd94f072700 1 mds.a e: '/home/pdonnell/ceph/build/bin/ceph-mds' -11> 2016-10-06 15:12:04.933162 7fd94f072700 1 mds.a 0: '/home/pdonnell/ceph/build/bin/ceph-mds' -10> 2016-10-06 15:12:04.933164 7fd94f072700 1 mds.a 1: '-i' -9> 2016-10-06 15:12:04.933165 7fd94f072700 1 mds.a 2: 'a' -8> 2016-10-06 15:12:04.933166 7fd94f072700 1 mds.a 3: '-c' -7> 2016-10-06 15:12:04.933167 7fd94f072700 1 mds.a 4: '/home/pdonnell/ceph/build/ceph.conf' -6> 2016-10-06 15:12:04.933212 7fd94f072700 1 mds.a exe_path /home/pdonnell/ceph/build/bin/ceph-mds (deleted) -5> 2016-10-06 15:12:04.933269 7fd94f072700 0 mds.a respawn execv /home/pdonnell/ceph/build/bin/ceph-mds failed with (2) No such file or directory -4> 2016-10-06 15:12:04.933394 7fd951ffe700 1 -- 127.0.0.1:6839/2977038019 >> 127.0.0.1:0/2098321338 conn(0x7fd962fa0000 :6839 s=STATE_OPEN pgs=2 cs=1 l=0).read_bulk peer close file descriptor 30 -3> 2016-10-06 15:12:04.933418 7fd951ffe700 1 -- 127.0.0.1:6839/2977038019 >> 127.0.0.1:0/2098321338 conn(0x7fd962fa0000 :6839 s=STATE_OPEN pgs=2 cs=1 l=0).read_until read failed -2> 2016-10-06 15:12:04.933428 7fd951ffe700 1 -- 127.0.0.1:6839/2977038019 >> 127.0.0.1:0/2098321338 conn(0x7fd962fa0000 :6839 s=STATE_OPEN pgs=2 cs=1 l=0).process read tag failed -1> 2016-10-06 15:12:04.933522 7fd951ffe700 0 -- 127.0.0.1:6839/2977038019 >> 127.0.0.1:0/2098321338 conn(0x7fd962fa0000 :6839 s=STATE_OPEN pgs=2 cs=1 l=0).fault with nothing to send, going to standby 0> 2016-10-06 15:12:04.935283 7fd94f072700 -1 /home/pdonnell/ceph/src/mds/MDSDaemon.cc: In function 'void MDSDaemon::respawn()' thread 7fd94f072700 time 2016-10-06 15:12:04.933310 /home/pdonnell/ceph/src/mds/MDSDaemon.cc: 1144: FAILED assert(0) ceph version v11.0.0-3180-g2e8c92d (2e8c92d8865345288df060c647b7d84be716dd8f) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fd958dcec75] 2: (MDSDaemon::respawn()+0x6f0) [0x7fd958a6b290] 3: (MDSDaemon::handle_mds_map(MMDSMap*)+0x1654) [0x7fd958a778b4] 4: (MDSDaemon::handle_core_message(Message*)+0x7b3) [0x7fd958a785b3] 5: (MDSDaemon::ms_dispatch(Message*)+0xdb) [0x7fd958a7884b] 6: (DispatchQueue::entry()+0x7ba) [0x7fd958f8bc7a] 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fd958e438bd] 8: (()+0x3c52607ee5) [0x7fd955da0ee5] 9: (clone()+0x6d) [0x7fd954e88d1d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
On Linux (at least), the correct thing to do is to execv("/proc/self/exe", ...) which will correctly execute the deleted executable file.
I have a fix in the works.
Updated by Patrick Donnelly over 7 years ago
- Status changed from In Progress to Fix Under Review
Updated by Patrick Donnelly over 7 years ago
- Status changed from Fix Under Review to Resolved
Updated by Patrick Donnelly over 7 years ago
- Status changed from Resolved to Pending Backport
- Backport set to jewel
Updated by Loïc Dachary over 7 years ago
- Copied to Backport #17841: jewel: mds fails to respawn if executable has changed added
Updated by Patrick Donnelly over 7 years ago
- Status changed from Pending Backport to Resolved
Actions