Project

General

Profile

Actions

Bug #17531

closed

mds fails to respawn if executable has changed

Added by Patrick Donnelly over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If the mds is failed via `ceph mds fail` and the executable file has changed, the mds will fail to respawn using exec:

   -14> 2016-10-06 15:12:04.933154 7fd94f072700  1 mds.a handle_mds_map i (127.0.0.1:6839/2977038019) dne in the mdsmap, respawning myself
   -13> 2016-10-06 15:12:04.933157 7fd94f072700  1 mds.a respawn
   -12> 2016-10-06 15:12:04.933160 7fd94f072700  1 mds.a  e: '/home/pdonnell/ceph/build/bin/ceph-mds'
   -11> 2016-10-06 15:12:04.933162 7fd94f072700  1 mds.a  0: '/home/pdonnell/ceph/build/bin/ceph-mds'
   -10> 2016-10-06 15:12:04.933164 7fd94f072700  1 mds.a  1: '-i'
    -9> 2016-10-06 15:12:04.933165 7fd94f072700  1 mds.a  2: 'a'
    -8> 2016-10-06 15:12:04.933166 7fd94f072700  1 mds.a  3: '-c'
    -7> 2016-10-06 15:12:04.933167 7fd94f072700  1 mds.a  4: '/home/pdonnell/ceph/build/ceph.conf'
    -6> 2016-10-06 15:12:04.933212 7fd94f072700  1 mds.a  exe_path /home/pdonnell/ceph/build/bin/ceph-mds (deleted)
    -5> 2016-10-06 15:12:04.933269 7fd94f072700  0 mds.a respawn execv /home/pdonnell/ceph/build/bin/ceph-mds failed with (2) No such file or directory
    -4> 2016-10-06 15:12:04.933394 7fd951ffe700  1 -- 127.0.0.1:6839/2977038019 >> 127.0.0.1:0/2098321338 conn(0x7fd962fa0000 :6839 s=STATE_OPEN pgs=2 cs=1 l=0).read_bulk peer close file descriptor 30
    -3> 2016-10-06 15:12:04.933418 7fd951ffe700  1 -- 127.0.0.1:6839/2977038019 >> 127.0.0.1:0/2098321338 conn(0x7fd962fa0000 :6839 s=STATE_OPEN pgs=2 cs=1 l=0).read_until read failed
    -2> 2016-10-06 15:12:04.933428 7fd951ffe700  1 -- 127.0.0.1:6839/2977038019 >> 127.0.0.1:0/2098321338 conn(0x7fd962fa0000 :6839 s=STATE_OPEN pgs=2 cs=1 l=0).process read tag failed
    -1> 2016-10-06 15:12:04.933522 7fd951ffe700  0 -- 127.0.0.1:6839/2977038019 >> 127.0.0.1:0/2098321338 conn(0x7fd962fa0000 :6839 s=STATE_OPEN pgs=2 cs=1 l=0).fault with nothing to send, going to standby
     0> 2016-10-06 15:12:04.935283 7fd94f072700 -1 /home/pdonnell/ceph/src/mds/MDSDaemon.cc: In function 'void MDSDaemon::respawn()' thread 7fd94f072700 time 2016-10-06 15:12:04.933310
/home/pdonnell/ceph/src/mds/MDSDaemon.cc: 1144: FAILED assert(0)

 ceph version v11.0.0-3180-g2e8c92d (2e8c92d8865345288df060c647b7d84be716dd8f)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fd958dcec75]
 2: (MDSDaemon::respawn()+0x6f0) [0x7fd958a6b290]
 3: (MDSDaemon::handle_mds_map(MMDSMap*)+0x1654) [0x7fd958a778b4]
 4: (MDSDaemon::handle_core_message(Message*)+0x7b3) [0x7fd958a785b3]
 5: (MDSDaemon::ms_dispatch(Message*)+0xdb) [0x7fd958a7884b]
 6: (DispatchQueue::entry()+0x7ba) [0x7fd958f8bc7a]
 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fd958e438bd]
 8: (()+0x3c52607ee5) [0x7fd955da0ee5]
 9: (clone()+0x6d) [0x7fd954e88d1d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

On Linux (at least), the correct thing to do is to execv("/proc/self/exe", ...) which will correctly execute the deleted executable file.

I have a fix in the works.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #17841: jewel: mds fails to respawn if executable has changedResolvedLoïc DacharyActions
Actions #1

Updated by Patrick Donnelly over 7 years ago

  • Status changed from In Progress to Fix Under Review
Actions #2

Updated by Patrick Donnelly over 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions #3

Updated by Patrick Donnelly over 7 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to jewel
Actions #4

Updated by Loïc Dachary over 7 years ago

  • Copied to Backport #17841: jewel: mds fails to respawn if executable has changed added
Actions #5

Updated by Patrick Donnelly over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF