Project

General

Profile

Actions

Bug #3666

closed

Segfault running test_libcephfs

Added by Noah Watkins over 11 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
libcephfs
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(local-read-test)kyoto:src $ cat ~/segf-test_libcephfs.txt 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/nwatkins/projects/ceph/ceph/src/lt-test_libcephfs...done.

warning: core file may not match specified executable file.
[New LWP 6230]
[New LWP 6236]
[New LWP 6233]
[New LWP 6237]
[New LWP 6248]
[New LWP 5382]
[New LWP 6244]
[New LWP 6227]
[New LWP 6245]
[New LWP 6226]
[New LWP 6225]
[New LWP 6224]
[New LWP 6240]
[New LWP 6241]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/nwatkins/projects/ceph/ceph/src/.libs/lt-test_libcephfs'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f647978ada8 in main_arena () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) thread apply all backtrace

Thread 14 (Thread 0x7f647aef7700 (LWP 6241)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f6479f5b28e in Wait (mutex=..., this=0x1954d88) at ./common/Cond.h:55
#2  Pipe::writer (this=0x1954bb0) at msg/Pipe.cc:1463
#3  0x00007f6479f64dbd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007f647aae7e9a in start_thread (arg=0x7f647aef7700) at pthread_create.c:308
#5  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 13 (Thread 0x7f645fefe700 (LWP 6240)):
#0  0x00007f64794ba303 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f6479f4ec02 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2013
#2  0x00007f6479f4ef20 in Pipe::tcp_read (this=0x1954770, buf=0x7f645fefddbf "\377", len=1) at msg/Pipe.cc:1986
#3  0x00007f6479f61544 in Pipe::reader (this=0x1954770) at msg/Pipe.cc:1241
#4  0x00007f6479f64ddd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007f647aae7e9a in start_thread (arg=0x7f645fefe700) at pthread_create.c:308
#6  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 12 (Thread 0x7f64783c4700 (LWP 6224)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f6479ecadab in ceph::log::Log::entry (this=0x1949b10) at log/Log.cc:318
#2  0x00007f647aae7e9a in start_thread (arg=0x7f64783c4700) at pthread_create.c:308
#3  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 11 (Thread 0x7f6477bc3700 (LWP 6225)):
#0  sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102
#1  0x00007f6479f927c8 in CephContextServiceThread::entry (this=0x194b9d0) at common/ceph_context.cc:57
#2  0x00007f647aae7e9a in start_thread (arg=0x7f6477bc3700) at pthread_create.c:308
#3  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7f64773c2700 (LWP 6226)):
#0  0x00007f64794ba303 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f6479f7df3b in AdminSocket::entry (this=0x194da00) at common/admin_socket.cc:228
#2  0x00007f647aae7e9a in start_thread (arg=0x7f64773c2700) at pthread_create.c:308
#3  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f645ffff700 (LWP 6245)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007f647aaf12b8 in _L_cond_lock_874 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f647aaf1088 in __pthread_mutex_cond_lock (mutex=0x7f6444006290) at ../nptl/pthread_mutex_lock.c:61
#3  0x00007f647aaebe18 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:236
#4  0x00007f6479f5b28e in Wait (mutex=..., this=0x7f6444006348) at ./common/Cond.h:55
#5  Pipe::writer (this=0x7f6444006170) at msg/Pipe.cc:1463
#6  0x00007f6479f64dbd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#7  0x00007f647aae7e9a in start_thread (arg=0x7f645ffff700) at pthread_create.c:308
#8  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f6476bc1700 (LWP 6227)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f6479f6b741 in Wait (mutex=..., this=0x195db20) at ./common/Cond.h:55
#2  SimpleMessenger::reaper_entry (this=0x195d6c0) at msg/SimpleMessenger.cc:206
#3  0x00007f6479f7055d in SimpleMessenger::ReaperThread::entry (this=<optimized out>) at msg/SimpleMessenger.h:411
#4  0x00007f647aae7e9a in start_thread (arg=0x7f6476bc1700) at pthread_create.c:308
#5  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f64743bc700 (LWP 6244)):
#0  0x00007f64794ba303 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f6479f4ec02 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2013
#2  0x00007f6479f4ef20 in Pipe::tcp_read (this=0x1954bb0, buf=0x7f64743bbdbf "\377\005", len=1) at msg/Pipe.cc:1986
#3  0x00007f6479f61544 in Pipe::reader (this=0x1954bb0) at msg/Pipe.cc:1241
#4  0x00007f6479f64ddd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007f647aae7e9a in start_thread (arg=0x7f64743bc700) at pthread_create.c:308
#6  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f647aeff780 (LWP 5382)):
#0  _IO_setb (f=0x7fffa4d61a90, b=0x7fffa4d61ce0 "127.0.0.1", eb=<optimized out>, a=<optimized out>) at genops.c:410
#1  0x00007f647944f713 in _IO_str_init_static_internal (sf=0x7fffa4d61a90, ptr=0x7fffa4d61ce0 "127.0.0.1", size=<optimized out>, 
    pstart=0x7fffa4d61ce0 "127.0.0.1") at strops.c:50
#2  0x00007f6479443806 in __IO_vsprintf (string=0x7fffa4d61ce0 "127.0.0.1", format=0x7f647954dbb5 "%u.%u.%u.%u", args=0x7fffa4d61ba8)
    at iovsprintf.c:42
#3  0x00007f6479425a07 in __sprintf (s=<optimized out>, format=<optimized out>) at sprintf.c:34
#4  0x00007f64794d4454 in inet_ntop4 (size=1025, dst=0x7fffa4d62410 "", src=0x195d6f4 "\177") at inet_ntop.c:99
#5  __GI_inet_ntop (af=<optimized out>, src=0x195d6f4, dst=0x7fffa4d62410 "", size=1025) at inet_ntop.c:67
#6  0x00007f64794e4e20 in __GI_getnameinfo (sa=<optimized out>, addrlen=<optimized out>, host=<optimized out>, hostlen=1025, 
    serv=<optimized out>, servlen=20, flags=3) at getnameinfo.c:357
#7  0x00007f6479ffe72f in operator<< (out=..., ss=...) at msg/msg_types.cc:149
#8  0x00007f6479f66bfa in operator<< (addr=..., out=...) at ./msg/msg_types.h:352
#9  _prefix (_dout=0x7fffa4d628d0, msgr=0x195d6c0) at msg/SimpleMessenger.cc:31
#10 0x00007f6479f6bd23 in SimpleMessenger::mark_down_all (this=0x195d6c0) at msg/SimpleMessenger.cc:548
#11 0x00007f6479f66cca in SimpleMessenger::shutdown (this=0x195d6c0) at msg/SimpleMessenger.cc:88
#12 0x00007f6479df7335 in shutdown (this=0x194ad80) at libcephfs.cc:129
#13 unmount (this=0x194ad80) at libcephfs.cc:113
#14 ceph_unmount (cmount=0x194ad80) at libcephfs.cc:248
#15 0x0000000000426279 in Remount (deep=true, this=0x194dfc0) at test/libcephfs/test.cc:91
#16 MountedTest_Mount_Test::TestBody (this=0x194dfc0) at test/libcephfs/test.cc:216
#17 0x0000000000451c7a in operator<< <const char> (this=0x7fffa4d62c20, pointer=<optimized out>) at ./include/gtest/gtest-message.h:142
#18 operator<< <const char> (this=0x7fffa4d62c20, pointer=<optimized out>) at ./src/gtest.cc:1988
#19 testing::Test::HasSameFixtureClass () at ./src/gtest.cc:2037
#20 0x0000000000000001 in ?? ()
#21 0x000000000000002e in ?? ()
#22 0x00000000004520b7 in ForEach<std::vector<testing::TestCase*>, void (*)(testing::TestCase*)> (functor=<optimized out>, c=...)
    at ./src/gtest-typed-test.cc:110
#23 ClearResult (this=0x451e4d) at ./src/gtest-internal-inl.h:759
#24 testing::internal::UnitTestImpl::RunAllTests (this=0x451e4d) at ./src/gtest.cc:3996
#25 0x0000000000423a95 in main (argc=1, argv=0x7fffa4d62dd8) at src/gtest_main.cc:38

Thread 5 (Thread 0x7f64741ba700 (LWP 6248)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007f647aaea065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f647aae9eba in __pthread_mutex_lock (mutex=0x7f6444006290) at pthread_mutex_lock.c:61
#3  0x00007f6479f30f83 in Mutex::Lock (this=0x7f6444006280, no_lockdep=<optimized out>) at common/Mutex.cc:90
#4  0x00007f6479f61a4a in Pipe::reader (this=0x7f6444006170) at msg/Pipe.cc:1242
#5  0x00007f6479f64ddd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#6  0x00007f647aae7e9a in start_thread (arg=0x7f64741ba700) at pthread_create.c:308
#7  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f645fdfd700 (LWP 6237)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f6479f5b28e in Wait (mutex=..., this=0x1954948) at ./common/Cond.h:55
#2  Pipe::writer (this=0x1954770) at msg/Pipe.cc:1463
#3  0x00007f6479f64dbd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007f647aae7e9a in start_thread (arg=0x7f645fdfd700) at pthread_create.c:308
#5  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f645fcfc700 (LWP 6233)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f6479f5b28e in Wait (mutex=..., this=0x19611e8) at ./common/Cond.h:55
#2  Pipe::writer (this=0x1961010) at msg/Pipe.cc:1463
#3  0x00007f6479f64dbd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007f647aae7e9a in start_thread (arg=0x7f645fcfc700) at pthread_create.c:308
#5  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f64742bb700 (LWP 6236)):
#0  0x00007f64794ba303 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f6479f4ec02 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2013
#2  0x00007f6479f4ef20 in Pipe::tcp_read (this=0x1961010, buf=0x7f64742badbf "\377", len=1) at msg/Pipe.cc:1986
#3  0x00007f6479f61544 in Pipe::reader (this=0x1961010) at msg/Pipe.cc:1241
#4  0x00007f6479f64ddd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007f647aae7e9a in start_thread (arg=0x7f64742bb700) at pthread_create.c:308
#6  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f64753be700 (LWP 6230)):
#0  0x00007f647978ada8 in main_arena () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f6479eb61e9 in ms_deliver_dispatch (m=0x7f6468000950, this=0x195d6c0) at msg/Messenger.h:549
#2  DispatchQueue::entry (this=0x195d7a8) at msg/DispatchQueue.cc:107
#3  0x00007f6479f6ff5d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:85
#4  0x00007f647aae7e9a in start_thread (arg=0x7f64753be700) at pthread_create.c:308
#5  0x00007f64794c5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()
(gdb) quit
Actions #1

Updated by Noah Watkins over 11 years ago

During unmount, the client is shutdown and free'd before the messenger. If any messages are delivered after the client is shutdown, then dereferencing the Dispatcher pointer (*p)->ms_dispatch (Messenger.h:549) will segfault. Solution is to shutdown messenger first to drain?

Actions #2

Updated by Sam Lang over 11 years ago

A similar issue was just handled in the ceph_fuse.cc code. There we just delay deleting the client till the end. You should be able to do the same thing in ceph_mount_info::shutdown().

-sam

Actions #3

Updated by Noah Watkins over 11 years ago

Rather than moving messenger shutdown into client shutdown?

Actions #4

Updated by Noah Watkins over 11 years ago

This is what I'm running to reproduce the error. It's been running now for an hour on wip-client-shutdown without any problem.

#include "include/cephfs/libcephfs.h" 
#include <stdio.h>
#include <unistd.h>
#include <assert.h>

int main(int argc, char **argv)
{
  struct ceph_mount_info *cmount;
  int count = 0;
  char dirname[1024];

  while (1) {
    ceph_create(&cmount, NULL);
    ceph_conf_read_file(cmount, NULL);
    ceph_mount(cmount, "/");

    sprintf(dirname, "/blah%d%d", count++, getpid());
    ceph_mkdir(cmount, dirname, 0777);

    ceph_unmount(cmount);
    ceph_release(cmount);

    ceph_create(&cmount, NULL);
    ceph_conf_read_file(cmount, NULL);
    ceph_mount(cmount, dirname);

    ceph_unmount(cmount);
    ceph_release(cmount);
  }

  return 0;
}
Actions #5

Updated by Sam Lang over 11 years ago

Right, I think your fix will work, but it breaks the interface abstraction (messenger is created above the client, destroyed inside it). I think putting the delete client; after the messenger shutdown also avoids the bug you're hitting and still maintains the abstraction.

Actions #6

Updated by Noah Watkins over 11 years ago

I pushed a new wip-client-shutdown. This switches the clean-up order of client/messenger in libcephfs, rather than moving the messenger clean-up into client shutdown.

Actions #7

Updated by Sage Weil over 11 years ago

  • Status changed from New to Resolved

commit:3a9408742a8a6cbc870cba543a208285f1a6cec1

Actions #8

Updated by Greg Farnum almost 8 years ago

  • Component(FS) libcephfs added
Actions

Also available in: Atom PDF