Project

General

Profile

Bug #5308

don't delete hierarchies before unmount

Added by Greg Farnum about 6 years ago. Updated almost 5 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
06/11/2013
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:

Description

I keep seeing "hung" runs of CephFS tests that look something like this at the end of the log:

2013-06-09T05:22:53.653 INFO:teuthology.task.workunit.client.0.out:OK.
2013-06-09T05:22:53.654 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'sudo rm -rf -- /home/ubuntu/cephtest/33629/mnt.0/client.0/tmp'
2013-06-09T05:22:53.725 INFO:teuthology.task.workunit:Stopping suites/tiobench.sh on client.0...
2013-06-09T05:22:53.725 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'rm -rf -- /home/ubuntu/cephtest/33629/workunits.list /home/ubuntu/cephtest/33629/workunit.client.0'
2013-06-09T05:22:53.741 DEBUG:teuthology.parallel:result is None
2013-06-09T05:22:53.741 DEBUG:teuthology.misc:with jobid basedir: 33629
2013-06-09T05:22:53.742 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'rm -rf -- /home/ubuntu/cephtest/33629/mnt.0/client.0'
2013-06-09T05:22:53.851 INFO:teuthology.task.workunit:Deleted dir /home/ubuntu/cephtest/33629/mnt.0/client.0
2013-06-09T05:22:53.852 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'rmdir -- /home/ubuntu/cephtest/33629/mnt.0'
2013-06-09T05:22:53.876 INFO:teuthology.orchestra.run.err:rmdir: failed to remove `/home/ubuntu/cephtest/33629/mnt.0': Device or resource busy
2013-06-09T05:22:53.876 DEBUG:teuthology.task.workunit:Caught an execption deleting dir /home/ubuntu/cephtest/33629/mnt.0
2013-06-09T05:22:53.876 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1b05a90>
2013-06-09T05:22:53.876 INFO:teuthology.task.ceph-fuse:Unmounting ceph-fuse clients...
2013-06-09T05:22:53.877 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'sudo fusermount -u /home/ubuntu/cephtest/33629/mnt.0'
2013-06-09T05:22:53.931 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:ceph-fuse[25016]: fuse finished with error 0
2013-06-09T05:22:57.681 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'rmdir -- /home/ubuntu/cephtest/33629/mnt.0'
2013-06-09T05:22:57.690 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1a6ad10>
2013-06-09T05:22:57.690 INFO:teuthology.misc:Shutting down mds daemons...
2013-06-09T05:22:57.691 DEBUG:teuthology.task.ceph.mds.a:waiting for process to exit
2013-06-09T05:22:57.701 INFO:teuthology.task.ceph.mds.a:Stopped
2013-06-09T05:22:57.701 INFO:teuthology.misc:Shutting down osd daemons...
2013-06-09T05:22:57.702 DEBUG:teuthology.task.ceph.osd.1:waiting for process to exit
2013-06-09T05:22:57.749 INFO:teuthology.task.ceph.osd.1:Stopped
2013-06-09T05:22:57.749 DEBUG:teuthology.task.ceph.osd.0:waiting for process to exit

Notice how everything is apparently okay, but then teuthology tries to remove the mount point before it's unmounted. After that a few more things shut down, but then the test hangs. This is a fairly new phenomenon and needs to be fixed to prevent noise in the nightly reports.

History

#1 Updated by Greg Farnum about 6 years ago

See eg /a/teuthology-2013-06-09_01:00:48-fs-master-testing-basic/33629/teuthology.log

#2 Updated by Sage Weil about 6 years ago

  • Status changed from New to Rejected

this hang on osd stop is a btrfs bug. there is a thread going on linux-btrfs

the EBUSY is normal, i think.. it's part of the change that lets you do a workunit without a real mount.

#3 Updated by Greg Farnum almost 6 years ago

  • Status changed from Rejected to New

I think Sage's update on this was a misdiagnosis — I don't see how btrfs is involved? In any case it's popping up again.

#4 Updated by Sage Weil almost 5 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF