Project

General

Profile

Actions

Bug #18505

open

kill nfs-ganesha + rgwfile get segfault

Added by Min Chen over 7 years ago. Updated about 7 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

NFS-Ganesha Release = V2.5-dev-7
nfs-ganesha compiled on Jan 3 2017 at 06:00:10
Release comment = GANESHA file server is 64 bits compliant and supports NFS v3,4.0,4.1 (pNFS) and 9P
Git HEAD = 4ee0fb128720fb5b3c5cbf2d8c9eee25a7a12f4b
Git Describe = V2.5-dev-7-0-g4ee0fb1

reproduce steps:
1. start nfs-ganesha with fsal rgwfile
2. wait until out grace period
3. pkill ganesha.nfsd

gdb stack:

Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7ffef97fa700 (LWP 159306)]
0x00007ffff5c4e48d in nanosleep () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install boost-iostreams-1.53.0-23.el7.x86_64 boost-random-1.53.0-23.el7.x86_64 boost-system-1.53.0-23.el7.x86_64 boost-thread-1.53.0-23.el7.x86_64 bzip2-libs-1.0.6-12.el7.x86_64 cryptopp-5.6.2-9.el7.x86_64 cyrus-sasl-lib-2.1.26-17.el7.x86_64 dbus-libs-1.6.12-11.el7.x86_64 elfutils-libelf-0.160-1.el7.x86_64 elfutils-libs-0.160-1.el7.x86_64 expat-2.1.0-8.el7.x86_64 fcgi-2.4.0-25.el7.x86_64 glibc-2.17-78.el7.x86_64 gssproxy-0.3.0-10.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.12.2-14.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-21.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libcurl-7.29.0-19.el7.x86_64 libgcc-4.8.3-9.el7.x86_64 libidn-1.28-3.el7.x86_64 libnfsidmap-0.25-12.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libssh2-1.4.3-8.el7.x86_64 libstdc++-4.8.3-9.el7.x86_64 libuuid-2.23.2-21.el7.x86_64 nspr-4.11.0-1.el7_2.x86_64 nss-3.21.0-9.el7_2.x86_64 nss-softokn-freebl-3.16.2.3-14.2.el7_2.x86_64 nss-util-3.21.0-2.2.el7_2.x86_64 openldap-2.4.39-6.el7.x86_64 openssl-libs-1.0.1e-42.el7.x86_64 pcre-8.32-14.el7.x86_64 systemd-libs-219-19.el7_2.4.x86_64 xz-libs-5.1.2-9alpha.el7.x86_64 zlib-1.2.7-13.el7.x86_64
(gdb) bt
#0 0x00007ffff5c4e48d in nanosleep () from /lib64/libc.so.6
#1 0x00007ffff5c4e324 in sleep () from /lib64/libc.so.6
#2 0x00007fffdea8c029 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) ()
from /lib64/libstdc++.so.6
#3 0x00007fffe9f338d6 in sleep_for<long, std::ratio<1l> > (__rtime=...) at /usr/include/c++/4.8.2/thread:281
#4 rgw::RGWLibProcess::run (this=0x87a670) at rgw/librgw.cc:108
#5 0x00007fffe9d52c4e in RGWProcessControlThread::entry (this=<optimized out>) at rgw/rgw_process.h:151
#6 0x00007ffff65c7df5 in start_thread () from /lib64/libpthread.so.0
#7 0x00007ffff5c871ad in clone () from /lib64/libc.so.6

Actions #1

Updated by Matt Benjamin about 7 years ago

Hi Min,

This may be fixed, will review further, also, retest on master?

Regards,

Matt

Actions #2

Updated by Matt Benjamin about 7 years ago

This doesn't look like a SIGSEGV (SIGUSR1).

I'll revisit shutdown process--I know you also posted a patch to address hang at join w/worker threads.

Actions #3

Updated by Min Chen about 7 years ago

@Matt Li, yes, nfs-ganesha will be blocked if librgw thread join
and can you reproduct this issue SIGUSR1?

Actions #4

Updated by Matt Benjamin about 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Matt Benjamin
  • Priority changed from High to Normal
Actions

Also available in: Atom PDF