Project

General

Profile

Bug #12297

ceph-fuse 0.94.2-1trusty segfaults / aborts

Added by Burkhard Linke over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Running CephFS for data volumes on a compute cluster.

ceph-fuse aborts and leaves the mount point unusable (transport endpoint not connected).

Except of client log incl. 20 latest ops:

   -20> 2015-07-13 12:08:13.905733 7f60be7fc700 20 client.412413 trim_caps counting as trimmed: 100015d4d13.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.875809 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60ac8c6320)
   -19> 2015-07-13 12:08:13.905744 7f60be7fc700 10 client.412413 put_inode on 100015d4d13.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.875809 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60ac8c6320)
   -18> 2015-07-13 12:08:13.905755 7f60be7fc700 20 client.412413  trying to trim dentries for 100015d4d14.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.893082 caps=pAsLsXsFscr(0=pAsLsXsFscr) parents=0x7f5fb43408e0 0x7f60acc44830)
   -17> 2015-07-13 12:08:13.905767 7f60be7fc700 15 client.412413 trim_dentry unlinking dn raw_reads.136.raw_reads.296.N2.las in dir 100012fa7e7
   -16> 2015-07-13 12:08:13.905770 7f60be7fc700 15 client.412413 unlink dir 0x7f60ac59d5a0 'raw_reads.136.raw_reads.296.N2.las' dn 0x7f5fb43408e0 inode 0x7f60acc44830
   -15> 2015-07-13 12:08:13.905773 7f60be7fc700 20 client.412413 unlink  inode 0x7f60acc44830 parents now 
   -14> 2015-07-13 12:08:13.905774 7f60be7fc700 10 client.412413 put_inode on 100015d4d14.head(ref=3 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.893082 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60acc44830)
   -13> 2015-07-13 12:08:13.905785 7f60be7fc700 15 client.412413 unlink  removing 'raw_reads.136.raw_reads.296.N2.las' dn 0x7f5fb43408e0
   -12> 2015-07-13 12:08:13.905789 7f60be7fc700 20 client.412413 trim_caps counting as trimmed: 100015d4d14.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.893082 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60acc44830)
   -11> 2015-07-13 12:08:13.905801 7f60be7fc700 10 client.412413 put_inode on 100015d4d14.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.893082 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60acc44830)
   -10> 2015-07-13 12:08:13.905840 7f60a3fff700 10 client.412413 _async_dentry_invalidate 'ch' ino 0 in dir 100009de0f2.head
    -9> 2015-07-13 12:08:13.905849 7f606bfff700 15 client.412413  de raw_reads.124.raw_reads.296.C2.las off 6170202972637346 = 0
    -8> 2015-07-13 12:08:13.906701 7f60a06f7700  2 -- 192.168.2.22:0/20333 >> 192.168.6.5:6810/4247 pipe(0x7f607c030cf0 sd=2 :37085 s=2 pgs=28199 cs=1 l=1 c=0x7f607c034f90).reader couldn't read tag, (11) Resource temporarily unavailable
    -7> 2015-07-13 12:08:13.906738 7f60a06f7700  2 -- 192.168.2.22:0/20333 >> 192.168.6.5:6810/4247 pipe(0x7f607c030cf0 sd=2 :37085 s=2 pgs=28199 cs=1 l=1 c=0x7f607c034f90).fault (11) Resource temporarily unavailable
    -6> 2015-07-13 12:08:13.906808 7f60be7fc700  1 client.412413.objecter ms_handle_reset on osd.61
    -5> 2015-07-13 12:08:13.906818 7f60be7fc700  1 -- 192.168.2.22:0/20333 mark_down 0x7f607c034f90 -- pipe dne
    -4> 2015-07-13 12:08:13.907127 7f60be7fc700 10 monclient: renew_subs
    -3> 2015-07-13 12:08:13.907135 7f60be7fc700 10 monclient: _send_mon_message to mon.mon-i1 at 192.168.6.50:6789/0
    -2> 2015-07-13 12:08:13.907141 7f60be7fc700  1 -- 192.168.2.22:0/20333 --> 192.168.6.50:6789/0 -- mon_subscribe({mdsmap=4312+,monmap=14+,osdmap=175910}) v2 -- ?+0 0x7f60ac68f8b0 con 0x3b50160
    -1> 2015-07-13 12:08:13.907279 7f606bfff700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f606bfff700

 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: ceph-fuse() [0x6235ca]
 2: (()+0x10340) [0x7f60c9dd8340]
 3: (std::string::assign(std::string const&)+0x1c) [0x7f60c8ff748c]
 4: (Client::_readdir_cache_cb(dir_result_t*, int (*)(void*, dirent*, stat*, int, long), void*)+0x39a) [0x55263a]
 5: (Client::readdir_r_cb(dir_result_t*, int (*)(void*, dirent*, stat*, int, long), void*)+0xfc5) [0x596045]
 6: ceph-fuse() [0x546d2d]
 7: (()+0x13e76) [0x7f60ca20fe76]
 8: (()+0x1522b) [0x7f60ca21122b]
 9: (()+0x11e49) [0x7f60ca20de49]
 10: (()+0x8182) [0x7f60c9dd0182]
 11: (clone()+0x6d) [0x7f60c875547d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2015-07-13 12:08:13.908290 7f60be7fc700  1 -- 192.168.2.22:0/20333 <== mon.3 192.168.6.50:6789/0 266 ==== osd_map(175910..175910 src has 172194..175910) v3 ==== 222+0+0 (8575416 0 0) 0x7f60b00008c0 con 0x3b50160

The OSD mentioned in OP 8 is up and running, but might take some time to respond due to current backfilling operations. The compute jobs are setup to write output to the same file, so several cephfs clients are trying to open a single file for writing.


Related issues

Copied to CephFS - Backport #12499: ceph-fuse 0.94.2-1trusty segfaults / aborts Resolved 07/13/2015

Associated revisions

Revision 62dd6376 (diff)
Added by Greg Farnum over 8 years ago

Client: check dir is still complete after dropping locks in _readdir_cache_cb

We drop the lock when invoking the callback, which means the directory
we're looking at might get dentries trimmed out of memory. Make sure that
hasn't happened after we get the lock back. If it has happened, fall back
to requesting the directory contents from the MDS. Update the dirp location
pointers after each entry to facilitate this.
Because this requires we update the dirp->at_cache_name value on every loop,
we rework the updating scheme a bit: to dereference the dn->name before
unlocking, so we know it's filled in; and since we update it on every loop
we don't need to refer to the previous dentry explicitly like we did before.

This should also handle racing file deletes: we get back a trace on
the removed dentry and that will clear the COMPLETE|ORDERED flags.

Fixes #12297

Signed-off-by: Greg Farnum <>

Revision 256620e3 (diff)
Added by Greg Farnum over 8 years ago

Client: check dir is still complete after dropping locks in _readdir_cache_cb

We drop the lock when invoking the callback, which means the directory
we're looking at might get dentries trimmed out of memory. Make sure that
hasn't happened after we get the lock back. If it has happened, fall back
to requesting the directory contents from the MDS. Update the dirp location
pointers after each entry to facilitate this.
Because this requires we update the dirp->at_cache_name value on every loop,
we rework the updating scheme a bit: to dereference the dn->name before
unlocking, so we know it's filled in; and since we update it on every loop
we don't need to refer to the previous dentry explicitly like we did before.

This should also handle racing file deletes: we get back a trace on
the removed dentry and that will clear the COMPLETE|ORDERED flags.

Fixes #12297

Signed-off-by: Greg Farnum <>
(cherry picked from commit 62dd63761701a7e0f7ce39f4071dcabc19bb1cf4)

History

#1 Updated by Greg Farnum over 8 years ago

  • Project changed from Ceph to CephFS
  • Category set to 45

Do you have a core file from this that you can look at with gdb, and the ceph debug packages installed? (You can pull them down if they aren't already present.)

If you can look at it and just paste the output of "backtrace" here that would be very helpful; as it is I'm not sure which of the string users are broken.

#2 Updated by Burkhard Linke over 8 years ago

Sorry, no core file available. I have a number of hosts with CephFS running without any problems; some other hosts show the described behaviour.

Since some of the compute jobs are still pending I may reactivate one of the nodes in the cluster and hope that it fails again (not 100% reproducible..).

How to I manipulate the limits for fuse mounts? Is setting the limit prior to mounting enough?

#3 Updated by Greg Farnum over 8 years ago

You mean the ulimit for dumping the core file? That will depend on how you're invoking it; if you're just running "ceph-fuse ..." then you can set "ulimit -c unlimited" in the shell before doing so and that will dump the core, yes.

#4 Updated by Burkhard Linke over 8 years ago

(gdb) bt
#0 0x00007f70d7bf020b in raise (sig=7)
at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1 0x000000000062368d in reraise_fatal (signum=7)
at global/signal_handler.cc:59
#2 handle_fatal_signal (signum=7) at global/signal_handler.cc:109
#3 <signal handler called>
#4 Client::_readdir_cache_cb (this=this@entry=0x38a8ab0,
dirp=dirp@entry=0x7f6f7c018400, cb=cb@entry=
0x547820 <fuse_ll_add_dirent(void*, dirent*, stat*, int, off_t)>,
p=p@entry=0x7f707d7cac20) at client/Client.cc:6048
#5 0x0000000000596045 in Client::readdir_r_cb (this=0x38a8ab0,
d=d@entry=0x7f6f7c018400,
cb=cb@entry=0x547820 <fuse_ll_add_dirent(void*, dirent*, stat*, int, off_t)>, p=p@entry=0x7f707d7cac20) at client/Client.cc:6171
#6 0x0000000000546d2d in fuse_ll_readdir (req=0x7f6f9c004ce0,
ino=1099533194016, size=4096, off=129, fi=<optimized out>)
at client/fuse_ll.cc:582
#7 0x00007f70d8027e76 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#8 0x00007f70d802922b in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#9 0x00007f70d8025e49 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#10 0x00007f70d7be8182 in start_thread (arg=0x7f707d7cb700)
at pthread_create.c:312
#11 0x00007f70d656d47d in clone ()

#5 Updated by Greg Farnum over 8 years ago

That's pretty bizarre. Signal 7 is SIGBUS "Bus error (bad memory access)", which is separate from SIGSEGV. And given the surrounding code that's pretty well guarded against. In GDB again can you print the value of "dn" and (if it's valid) dn->inode?
[run gdb]
f 4
p dn
p dn->inode

Or you can join us on irc, oftc.net #ceph-devel and ping gregsfortytwo. Thanks!

#6 Updated by Burkhard Linke over 8 years ago

(gdb) f 4
#4 Client::_readdir_cache_cb (this=this@entry=0x38a8ab0, dirp=dirp@entry=0x7f6f7c018400, cb=cb@entry=
0x547820 <fuse_ll_add_dirent(void*, dirent*, stat*, int, off_t)>, p=p@entry=0x7f707d7cac20) at client/Client.cc:6048
6048 client/Client.cc: No such file or directory.
(gdb) p dn
$1 = (Dentry *) 0x342e746e65696c63
(gdb) p dn->inode
Cannot access memory at address 0x342e746e65696c9b

#7 Updated by Greg Farnum over 8 years ago

Okay, I think I see what happened, one more please.

[run gdb]
f 4
p dir
p dir->dentry_list

And if you could take the log for the ceph-fuse instance you're looking at now and upload it with ceph-post-file that is also likely to be helpful.

#8 Updated by Burkhard Linke over 8 years ago

(gdb) p dir
$2 = (Dir *) 0x7f709000f9a0
(gdb) p dir->dentry_list
$3 = {_front = 0x7f6f8002a260, _back = 0x7f70b83aaa60, _size = 9}

The log file has a size of 13 GB and the host has no direct connection to the internet. I can transfer it to another host and upload it, but given its size I would prefer to only upload a part.

#9 Updated by Greg Farnum over 8 years ago

  • Status changed from New to 12

Zheng, can you take a look at this and see if you need the log or can identify the issue? I presume something's gone wrong in advancing to the at_cache_name dentry, but I'm not quite sure how.

#10 Updated by Greg Farnum over 8 years ago

  • Priority changed from Normal to Urgent

#11 Updated by Zheng Yan over 8 years ago

I can't figure out what's wrong neither

#12 Updated by Greg Farnum over 8 years ago

We've got logs and core file via ceph-post-file

c00fb309-e2a1-482c-b093-1733affb2901 gzipped core dump
60fbd6a7-ffde-4cf0-988b-a80737a1d1ec ceph-fuse log file
distribution is ubuntu 14.04
ceph-fuse --version: ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)

#13 Updated by Greg Farnum over 8 years ago

Okay, I dug into this today and I think our readdir is a bit broken (obviously). In particular, we're dropping the client lock while invoking the callback, and I can see in the log that one of the times we dropped the lock a whole bunch of dentries got unlinked. So then we get back the lock and try to look at the next dentry in our xlist iterator (which was advanced to a valid dentry while we still had the lock), but the dentry we're trying to look at has (it appears) been freed!

I'm not sure how this is supposed to work; investigating now.

#14 Updated by Greg Farnum over 8 years ago

  • Status changed from 12 to Fix Under Review

I think I have a fix for this issue but I want to get a few eyes on it before handing it off for user testing.

#15 Updated by Greg Farnum over 8 years ago

Burkhard, if you can test out the hammer-12297 branch on one of the boxes seeing this bug and find out if that fixes it, that would be helpful. I've not yet figured out a good way to reproduce the issue.

#16 Updated by Greg Farnum over 8 years ago

  • Assignee set to Greg Farnum
  • Source changed from other to Community (user)

#17 Updated by Burkhard Linke over 8 years ago

Branch is deployed to the compute cluster host and the application causing the problem is started.

Debug output and core file generation is enabled. Since the problem is not 100% reproducible all cluster nodes are running the new branch.

Testing will take some hours, and if no ceph-fuse process is crashed after that time it may indicate that the fix is successful.

#18 Updated by Burkhard Linke over 8 years ago

Cluster has been processing data without any problem during the weekend. I consider the bug fixed.

#19 Updated by Greg Farnum over 8 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Priority changed from Urgent to Normal
  • Backport set to hammer

Excellent! This is merged to master now.

#20 Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)

#21 Updated by Greg Farnum over 8 years ago

  • Assignee deleted (Greg Farnum)

#22 Updated by Loïc Dachary over 8 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF