Project

General

Profile

Actions

Bug #11784

closed

ceph-fuse hang on unmount (stuck dentry refs)

Added by John Spray almost 9 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-05-19_23:14:02-samba-master-testing-basic-typica/21446/

The next step after the last "ceph-fuse not mounted, got fs type 'ext2/ext3'" log message would have been the self.fuse_daemon.wait in FuseMount.umount_wait, so this may be a case where the fuse mount is tearing down, but the ceph-fuse process is failing to terminate for some reason.

Need to watch for another case of this happening to get more information.

Actions #1

Updated by John Spray almost 9 years ago

  • Project changed from Ceph to CephFS
  • Category set to 45
Actions #2

Updated by Greg Farnum almost 9 years ago

John, is this likely to have been a dup of #11294? We can tell by checking out the ceph-fuse log if it's still available (although probably not on this one given the lag time).

Actions #3

Updated by Greg Farnum almost 9 years ago

  • Subject changed from ceph-fuse hang on unmount after smbtorture workload to ceph-fuse hang on unmount (stuck dentry refs)
  • Priority changed from Normal to High

We saw this again today, and it's definitely inode refs this time:
http://pulpito-rdu.front.sepia.ceph.com/gregf-2015-05-31_20:59:54-fs-greg-fs-testing---basic-typica/1145/

Unlike previously, that's not a samba run! :(

Actions #4

Updated by Greg Farnum almost 9 years ago

I copied the ceph-client log into that folder, although we're missing the server logs.

Actions #5

Updated by Zheng Yan almost 9 years ago

2015-06-01T06:50:15.928 INFO:teuthology.orchestra.run.typica012.stderr:fusermount: failed to unmount /home/ubuntu/cephtest/mnt.0: Device or resource busy
2015-06-01T06:50:15.930 INFO:tasks.cephfs.fuse_mount:Failed to unmount ceph-fuse on ubuntu@typica012.front.sepia.ceph.com, aborting...
        except run.CommandFailedError:
            log.info('Failed to unmount ceph-fuse on {name}, aborting...'.format(name=self.client_remote.name))

            # abort the fuse mount, killing all hung processes
            if self._fuse_conn:
                self.run_python(dedent(""" 
                import os
                path = "/sys/fs/fuse/connections/{0}/abort" 
                if os.path.exists(path):
                    open(path, "w").write("1")
                """).format(self._fuse_conn))
                self._fuse_conn = None

            stderr = StringIO()

aborting fuse can explian the ll_ref leaking

Actions #6

Updated by Greg Farnum almost 9 years ago

Hmm, there shouldn't have been any activity on the mount by this point. Maybe we've got some other kind of bug, though.

Actions #7

Updated by Sage Weil over 8 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF