Project

General

Profile

Actions

Feature #206

open

make a 'soft' mode

Added by Sage Weil almost 14 years ago. Updated about 12 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:

Description

On Wed, 16 Jun 2010, Peter Niemayer wrote:

Hi,

trying to "umount" a formerly mounted ceph filesystem that has become
unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph
stop) results in "umount" hanging forever in
"D" state.

Strangely, "umount -f" started from another terminal reports
the ceph filesystem as not being mounted anymore, which is consistent
with what the mount-table says.

The kernel keeps emitting the following messages from time to time:

Jun 16 17:25:29 gitega kernel: ceph: tid 211912 timed out on osd0, will
reset osd
Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection
failed
Jun 16 17:26:15 gitega last message repeated 4 times

I would have expected the "umount" to terminate at least after some generous
timeout.

Ceph should probably support something like the "soft,intr" options
of NFS, because if the only supported way of mounting is one where
a client is more or less stuck-until-reboot when the service fails,
many potential test-configurations involving Ceph are way too dangerous
to try...

Yeah, being able to force it to shut down when servers are unresponsive is
definitely the intent. 'umount -f' should work. It sounds like the
problem is related to the initial 'umount' (which doesn't time out)
followed by 'umount -f'.

I'm hesitant to add a blanket umount timeout, as that could prevent proper
writeout of cached data/metadata in some cases. So I think the goal
should be that if a normal umount hangs for some reason, you should be
able to intervene to add the 'force' if things don't go well.

Actions #1

Updated by Sage Weil almost 14 years ago

the problem is that umount session close stage does actually time out already (same timeout as mount), but the 'flush' (caps, data, etc.) stage does not.

Actions #2

Updated by Sage Weil over 13 years ago

  • Target version set to v2.6.35
Actions #3

Updated by Sage Weil over 13 years ago

  • Priority changed from Normal to High
Actions #4

Updated by Sage Weil over 13 years ago

  • Subject changed from umount shouldn't hang on failed servers to make a 'soft' mode
  • Target version changed from v2.6.35 to v2.6.36
Actions #5

Updated by Sage Weil over 13 years ago

Make 'hard' and 'soft' mount options, ala nfs. Default is 'hard'.

       soft           If an NFS file operation has a major timeout then report
                      an I/O error to the calling program.  The default is  to
                      continue retrying NFS file operations indefinitely.

       hard           If an NFS file operation has a major timeout then report
                      "server not responding"  on  the  console  and  continue
Actions #6

Updated by Sage Weil over 13 years ago

  • Target version changed from v2.6.36 to v2.6.37
Actions #7

Updated by Sage Weil over 13 years ago

  • Target version changed from v2.6.37 to v2.6.38
Actions #8

Updated by Sage Weil over 13 years ago

  • Target version deleted (v2.6.38)
Actions #9

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position deleted (519)
  • Translation missing: en.field_position set to 415
Actions #10

Updated by Sage Weil over 12 years ago

  • Priority changed from High to Normal
Actions #11

Updated by Sage Weil about 12 years ago

  • Tracker changed from Bug to Feature
Actions

Also available in: Atom PDF