Project

General

Profile

Actions

Bug #1194

closed

kclient: NFS reexport does not survive ceph fs remount

Added by Brian Chrisman almost 13 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

NFS doesn't survive restarts/remounts
Reproduce:
create new ceph fs
export via NFS
(on NFS client) nfs mount
(on NFS client) create a couple files/directories
stop nfs server
umount ceph filesystem
mount ceph filesystem
start nfs server
(on NFS client) access (cat) file in nfs mount

  1. Stale NFS file handle

If I perform a 'find' on the ceph filesystem before attempting to access from NFS client after restart, no stale filehandle.
Kernel client is still depending upon something being in a cache.

ceph version is ~0.28 era.
kernel is 2.6.39 pulled within the last week from ceph-client git repo


Files

STALE_on_remount.messages.log (916 KB) STALE_on_remount.messages.log Brian Chrisman, 06/20/2011 03:08 PM
STALE_on_remount.mds.log (51.1 KB) STALE_on_remount.mds.log Brian Chrisman, 06/20/2011 03:08 PM
Actions #1

Updated by Brian Chrisman almost 13 years ago

I reproduced the problem several times before submitting this bug but can't reproduce it now.
I'm going to leave this open until tomorrow when I can try again with a fuller filesystem (only possible difference, though I'm pretty sure I ran a mkcephfs in there during my previous recreation).

Updated by Brian Chrisman almost 13 years ago

I was able to reproduce this.
ESTALE shows up in the messages log.
I don't see much in the mds log.

At this point, the fs is pretty much borked until I go into the ceph direct mount and perform lookup operations.
I turned mds and kernel logging on just before running the operation I new would produce the ESTALE.
I'm not sure how much I'd have to do to go from scratch as this is after copying a source tree to the filesystem with scp, then startup up the nfs export, then mounting, then attempting to build. That encounters an ESTALE and I continue to get the ESTALE after I've restarted everything from ceph on up (without umount/mount of the NFS client).

In this case, the NFS client is holding filehandles while the ceph/NFS server restart.

Actions #3

Updated by Sage Weil almost 13 years ago

  • Target version set to v0.32
Actions #4

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_position set to 723
Actions #5

Updated by Sage Weil almost 13 years ago

  • Assignee set to Sage Weil

I don't have things set up to reproduce/test this easily, but it looks like this is the problem. Can you give it a go?

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 79743d1..5e0e8d1 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1571,11 +1571,11 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
                r = build_dentry_path(rdentry, ppath, pathlen, ino, freepath);
                dout(" dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen,
                     *ppath);
-       } else if (rpath) {
+       } else if (rpath || rino) {
                *ino = rino;
                *ppath = rpath;
                *pathlen = strlen(rpath);
-               dout(" path %.*s\n", *pathlen, rpath);
+               dout(" path #%llx/%.*s\n", rino, *pathlen, rpath);
        }

        return r;
Actions #6

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.32 to v0.33
  • Translation missing: en.field_position deleted (755)
  • Translation missing: en.field_position set to 4
Actions #7

Updated by Sage Weil over 12 years ago

  • Subject changed from NFS reexport does not survive ceph fs remount to kclient: NFS reexport does not survive ceph fs remount
  • Translation missing: en.field_position deleted (7)
  • Translation missing: en.field_position set to 7
Actions #8

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.33 to v0.34
Actions #9

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position deleted (28)
  • Translation missing: en.field_position set to 23
Actions #10

Updated by Sage Weil over 12 years ago

  • Status changed from New to 7

pushed this to for-linus branch.

Actions #11

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.34 to v0.35
Actions #12

Updated by Sage Weil over 12 years ago

  • Assignee changed from Sage Weil to Brian Chrisman
Actions #13

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.35 to v0.36
Actions #14

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position deleted (106)
  • Translation missing: en.field_position set to 1
  • Translation missing: en.field_position changed from 1 to 905
Actions #15

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.36 to v0.37
Actions #16

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.37 to v0.38
Actions #17

Updated by Sage Weil over 12 years ago

  • Status changed from 7 to Resolved

going to assume the above fixed it until we hear otherwise :)

Actions

Also available in: Atom PDF