Bug #5308
closeddon't delete hierarchies before unmount
0%
Description
I keep seeing "hung" runs of CephFS tests that look something like this at the end of the log:
2013-06-09T05:22:53.653 INFO:teuthology.task.workunit.client.0.out:OK. 2013-06-09T05:22:53.654 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'sudo rm -rf -- /home/ubuntu/cephtest/33629/mnt.0/client.0/tmp' 2013-06-09T05:22:53.725 INFO:teuthology.task.workunit:Stopping suites/tiobench.sh on client.0... 2013-06-09T05:22:53.725 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'rm -rf -- /home/ubuntu/cephtest/33629/workunits.list /home/ubuntu/cephtest/33629/workunit.client.0' 2013-06-09T05:22:53.741 DEBUG:teuthology.parallel:result is None 2013-06-09T05:22:53.741 DEBUG:teuthology.misc:with jobid basedir: 33629 2013-06-09T05:22:53.742 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'rm -rf -- /home/ubuntu/cephtest/33629/mnt.0/client.0' 2013-06-09T05:22:53.851 INFO:teuthology.task.workunit:Deleted dir /home/ubuntu/cephtest/33629/mnt.0/client.0 2013-06-09T05:22:53.852 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'rmdir -- /home/ubuntu/cephtest/33629/mnt.0' 2013-06-09T05:22:53.876 INFO:teuthology.orchestra.run.err:rmdir: failed to remove `/home/ubuntu/cephtest/33629/mnt.0': Device or resource busy 2013-06-09T05:22:53.876 DEBUG:teuthology.task.workunit:Caught an execption deleting dir /home/ubuntu/cephtest/33629/mnt.0 2013-06-09T05:22:53.876 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1b05a90> 2013-06-09T05:22:53.876 INFO:teuthology.task.ceph-fuse:Unmounting ceph-fuse clients... 2013-06-09T05:22:53.877 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'sudo fusermount -u /home/ubuntu/cephtest/33629/mnt.0' 2013-06-09T05:22:53.931 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:ceph-fuse[25016]: fuse finished with error 0 2013-06-09T05:22:57.681 DEBUG:teuthology.orchestra.run:Running [10.214.133.38]: 'rmdir -- /home/ubuntu/cephtest/33629/mnt.0' 2013-06-09T05:22:57.690 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1a6ad10> 2013-06-09T05:22:57.690 INFO:teuthology.misc:Shutting down mds daemons... 2013-06-09T05:22:57.691 DEBUG:teuthology.task.ceph.mds.a:waiting for process to exit 2013-06-09T05:22:57.701 INFO:teuthology.task.ceph.mds.a:Stopped 2013-06-09T05:22:57.701 INFO:teuthology.misc:Shutting down osd daemons... 2013-06-09T05:22:57.702 DEBUG:teuthology.task.ceph.osd.1:waiting for process to exit 2013-06-09T05:22:57.749 INFO:teuthology.task.ceph.osd.1:Stopped 2013-06-09T05:22:57.749 DEBUG:teuthology.task.ceph.osd.0:waiting for process to exit
Notice how everything is apparently okay, but then teuthology tries to remove the mount point before it's unmounted. After that a few more things shut down, but then the test hangs. This is a fairly new phenomenon and needs to be fixed to prevent noise in the nightly reports.
Updated by Greg Farnum almost 11 years ago
See eg /a/teuthology-2013-06-09_01:00:48-fs-master-testing-basic/33629/teuthology.log
Updated by Sage Weil almost 11 years ago
- Status changed from New to Rejected
this hang on osd stop is a btrfs bug. there is a thread going on linux-btrfs
the EBUSY is normal, i think.. it's part of the change that lets you do a workunit without a real mount.
Updated by Greg Farnum over 10 years ago
- Status changed from Rejected to New
I think Sage's update on this was a misdiagnosis — I don't see how btrfs is involved? In any case it's popping up again.
Updated by Sage Weil over 9 years ago
- Status changed from New to Can't reproduce