Project

General

Profile

Actions

Bug #596

closed

crash during mds reconnect

Added by Greg Farnum over 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While testing my Journaler changes, I got a cfuse segfault. My steps:
vstart with 1 of each daemon
mount cfuse
copy in the pjd workunit, start running
kill -9 the mds while it was untarring
restart the mds
kill -9 the mds while running tests
restart the mds
then cfuse crashed
The other daemons seemed to be fine, and restarting the entire system worked and let me mount cfuse again.
Unfortunately I had no logging.

I reproduced this on unstable by just killing and restarting the MDS once the pjd test and started running tests. It seems to be 100% reproducible.

#0  0x000000000046a1fa in Client::encode_dentry_release (this=0x2924700, dn=0x29cfe00, req=0x2dfa000, mds=0, drop=256, unless=512) at client/Client.cc:1105
1105                                          mds, drop, unless, 1);
(gdb) bt
#0  0x000000000046a1fa in Client::encode_dentry_release (this=0x2924700, dn=0x29cfe00, req=0x2dfa000, mds=0, drop=256, unless=512) at client/Client.cc:1105
#1  0x000000000046a4bb in Client::encode_cap_releases (this=0x2924700, req=0x2f05280, m=0x2dfa000, mds=0) at client/Client.cc:1140
#2  0x0000000000472312 in Client::send_request (this=0x2924700, request=0x2f05280, mds=0) at client/Client.cc:1218
#3  0x0000000000472976 in Client::resend_unsafe_requests (this=0x2924700, mds_num=0) at client/Client.cc:1596
#4  0x00000000004804e1 in Client::send_reconnect (this=<value optimized out>, mds=0) at client/Client.cc:1566
#5  0x0000000000493bdc in Client::handle_mds_map (this=0x2924700, m=<value optimized out>) at client/Client.cc:1494
#6  0x000000000049b65b in Client::ms_dispatch (this=0x2924700, m=0x2d22600) at client/Client.cc:1410
#7  0x000000000044c479 in Messenger::ms_deliver_dispatch (this=0x2934000) at msg/Messenger.h:97
#8  SimpleMessenger::dispatch_entry (this=0x2934000) at msg/SimpleMessenger.cc:332
#9  0x0000000000444c2c in SimpleMessenger::DispatchThread::entry (this=0x2934488) at msg/SimpleMessenger.h:529
#10 0x000000000045853a in Thread::_entry_func (arg=0x2924700) at ./common/Thread.h:39
#11 0x00007faf195e173a in start_thread (arg=<value optimized out>) at pthread_create.c:300
#12 0x00007faf1834e69d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#13 0x0000000000000000 in ?? ()
(gdb) p *dn
$1 = {
  <LRUObject> = {
    lru_next = 0x27, 
    lru_prev = 0x4c, 
    lru_pinned = false, 
    lru = 0x657473662d646a70, 
    lru_list = 0x30383030322d7473
  }, 
  members of Dentry: 
  name = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<char>> = {
        <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider: 
      _M_p = 0x747365742f363138 <Address 0x747365742f363138 out of bounds>
    }
  }, 
  dir = 0x61636e7572742f73, 
  inode = 0x742e36302f6574, 
  ref = 0, 
  offset = 2, 
  lease_mds = -1, 
  lease_ttl = {
    tv = {
      tv_sec = 0, 
      tv_nsec = 0
    }
  }, 
  lease_gen = 0, 
  lease_seq = 0, 
  cap_shared_gen = 1
}
(gdb) p *(dn->dir)
Cannot access memory at address 0x61636e7572742f73
Actions

Also available in: Atom PDF