Project

General

Profile

Bug #1194

kclient: NFS reexport does not survive ceph fs remount

Added by Brian Chrisman almost 13 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

NFS doesn't survive restarts/remounts
Reproduce:
create new ceph fs
export via NFS
(on NFS client) nfs mount
(on NFS client) create a couple files/directories
stop nfs server
umount ceph filesystem
mount ceph filesystem
start nfs server
(on NFS client) access (cat) file in nfs mount

  1. Stale NFS file handle

If I perform a 'find' on the ceph filesystem before attempting to access from NFS client after restart, no stale filehandle.
Kernel client is still depending upon something being in a cache.

ceph version is ~0.28 era.
kernel is 2.6.39 pulled within the last week from ceph-client git repo

STALE_on_remount.messages.log View (916 KB) Brian Chrisman, 06/20/2011 03:08 PM

STALE_on_remount.mds.log View (51.1 KB) Brian Chrisman, 06/20/2011 03:08 PM

History

#1 Updated by Brian Chrisman almost 13 years ago

I reproduced the problem several times before submitting this bug but can't reproduce it now.
I'm going to leave this open until tomorrow when I can try again with a fuller filesystem (only possible difference, though I'm pretty sure I ran a mkcephfs in there during my previous recreation).

#2 Updated by Brian Chrisman almost 13 years ago

I was able to reproduce this.
ESTALE shows up in the messages log.
I don't see much in the mds log.

At this point, the fs is pretty much borked until I go into the ceph direct mount and perform lookup operations.
I turned mds and kernel logging on just before running the operation I new would produce the ESTALE.
I'm not sure how much I'd have to do to go from scratch as this is after copying a source tree to the filesystem with scp, then startup up the nfs export, then mounting, then attempting to build. That encounters an ESTALE and I continue to get the ESTALE after I've restarted everything from ceph on up (without umount/mount of the NFS client).

In this case, the NFS client is holding filehandles while the ceph/NFS server restart.

#3 Updated by Sage Weil over 12 years ago

  • Target version set to v0.32

#4 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position set to 723

#5 Updated by Sage Weil over 12 years ago

  • Assignee set to Sage Weil

I don't have things set up to reproduce/test this easily, but it looks like this is the problem. Can you give it a go?

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 79743d1..5e0e8d1 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1571,11 +1571,11 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
                r = build_dentry_path(rdentry, ppath, pathlen, ino, freepath);
                dout(" dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen,
                     *ppath);
-       } else if (rpath) {
+       } else if (rpath || rino) {
                *ino = rino;
                *ppath = rpath;
                *pathlen = strlen(rpath);
-               dout(" path %.*s\n", *pathlen, rpath);
+               dout(" path #%llx/%.*s\n", rino, *pathlen, rpath);
        }

        return r;

#6 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.32 to v0.33
  • translation missing: en.field_position deleted (755)
  • translation missing: en.field_position set to 4

#7 Updated by Sage Weil over 12 years ago

  • Subject changed from NFS reexport does not survive ceph fs remount to kclient: NFS reexport does not survive ceph fs remount
  • translation missing: en.field_position deleted (7)
  • translation missing: en.field_position set to 7

#8 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.33 to v0.34

#9 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position deleted (28)
  • translation missing: en.field_position set to 23

#10 Updated by Sage Weil over 12 years ago

  • Status changed from New to 7

pushed this to for-linus branch.

#11 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.34 to v0.35

#12 Updated by Sage Weil over 12 years ago

  • Assignee changed from Sage Weil to Brian Chrisman

#13 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.35 to v0.36

#14 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position deleted (106)
  • translation missing: en.field_position set to 1
  • translation missing: en.field_position changed from 1 to 905

#15 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.36 to v0.37

#16 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.37 to v0.38

#17 Updated by Sage Weil over 12 years ago

  • Status changed from 7 to Resolved

going to assume the above fixed it until we hear otherwise :)

Also available in: Atom PDF