Project

General

Profile

Actions

Bug #49826

open

Multiple nfs-ganesha instances and strays objects in CephFS

Added by Aleksandr Rudenko about 3 years ago. Updated about 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Ganesha FSAL, libcephfs
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi!

We have one CephFS and two standalone ganesha instances on different hosts which export the same directory.
We don't use any HA solution for ganesha and we don't need any HA.
Our nfs-clients are centos 7.5. Clients can mount any of our ganesha servers (DNS round-robin). Clients can't work with same file in same time. Clients don't overwrite files and we can't have any race. Everything works fine, but we have problem with file removing.

Problem:

mount nfs-server1
create file1
umount nfs-server1
---
mount nfs-server2
rm -rf file1
umount nfs-server2

After this we can't see file1 in directory any more.
But, on active Ceph MDS we can see increased strays count:

ceph daemon mds.c perf dump | grep num_strays
        "num_strays": 1,
        "num_strays_delayed": 0,
        "num_strays_enqueuing": 0,

Usual stray objects are purged after 10-20 secs. But not in this case. In this case stray object will be purget only after restart nfs-server1!

I can't see this behavior when i use two different native CephFS clients. It reproduced 100% only on ganesha instances.

I have seen this behavior on Ceph 12.2.12 and ganesha 2.7.1.
And i can see this behavior on Ceph 14.2.15 and ganesha 2.8.1.
I can see this on Ceph 14.2.15 and ganesha v3.5.1

ganesha.conf:


NFS_CORE_PARAM {
    Enable_NLM = false;
    Enable_RQUOTA = false;
    Protocols = 4;
    NFS_Port = 2049;
    MaxRPCSendBufferSize = 9437184;
    MaxRPCRecvBufferSize = 9437184;
}

NFSv4 {
    Minor_Versions =  0,1,2;
    Grace_Period = 10;
    Lease_Lifetime = 10;
}

CACHEINODE {
    Dir_Chunk = 0;
    Dir_Max = 1;
    NParts = 1;
    Cache_FDs = false;
    Cache_Size = 1;
}

EXPORT_DEFAULTS {
    Access_Type = RW;
    Attr_Expiration_Time = 0;
    Transports = TCP;
}

EXPORT {
    Export_ID=100;
    Path = /nfs;
    Pseudo = /;
    Squash = Root_Squash;

    FSAL {
        Name = CEPH;
        User_Id = ngw.b;
        Secret_Access_Key = "key";
    }
}

LOG {
    Components {
        ALL = INFO;
    }
}

mount params:

mount -t nfs -o "lookupcache=positive,vers=4.1,soft,timeo=250,retrans=2" nfs-server:/ /mnt

Actions #1

Updated by Patrick Donnelly about 3 years ago

Aleksandr Rudenko wrote:

Usual stray objects are purged after 10-20 secs. But not in this case. In this case stray object will be purget only after restart nfs-server1!

This is normal. NFS has the unlinked file in its cache which is keeping it pinned in the MDS stray directory.

Actions #2

Updated by Jeff Layton about 3 years ago

The strays behavior makes some sense, since we don't really do anything client-side to notify the application when there is a removal, and ganesha will hold on to a reference to an inode for a long time.

We should probably change libcephfs to call _schedule_ino_release_callback when it detects that the file has been unlinked remotely. I'm a little unclear on how best to achieve that though. Do we get any other notification of dentry removal aside from the inode's caps being recalled?

Actions

Also available in: Atom PDF