Bug #11835
FuseMount.umount_wait can hang
0%
Description
Currently code in FuseMount.umount assumes that the write to /sys/fuse/connections/X/abort causes the process to terminate, then the "umount -lf" cleans up the mount. Subsequently does a blocking wait on the fuse_daemon RemoteProcess in umount_wait.
There exist cases where the write to abort does not cause the process to terminate, and in these cases FuseMount.umount_wait can hang forever.
To catch these promptly, apply a short (<1m) timeout to the wait on the fuse_daemon process. If it fails to die, emit a terrifying exception to call attention to the fact that something went internally wrong with ceph-fuse.
Associated revisions
tasks/cephfs: time out on ceph-fuses that don't die
For cases where we have e.g. poked the fuse abort
file for a process, but it's still not dying. Because
this is a special class of error (unlike e.g. when
we force umount something because the network is gone)
raise the error instead of trying again to kill
the client.
Fixes: #11835
Signed-off-by: John Spray <john.spray@redhat.com>
History
#1 Updated by John Spray almost 9 years ago
- Status changed from New to Fix Under Review
#2 Updated by Greg Farnum over 8 years ago
- Status changed from Fix Under Review to Resolved