Bug #49826
openMultiple nfs-ganesha instances and strays objects in CephFS
0%
Description
Hi!
We have one CephFS and two standalone ganesha instances on different hosts which export the same directory.
We don't use any HA solution for ganesha and we don't need any HA.
Our nfs-clients are centos 7.5. Clients can mount any of our ganesha servers (DNS round-robin). Clients can't work with same file in same time. Clients don't overwrite files and we can't have any race. Everything works fine, but we have problem with file removing.
Problem:
mount nfs-server1
create file1
umount nfs-server1
---
mount nfs-server2
rm -rf file1
umount nfs-server2
After this we can't see file1 in directory any more.
But, on active Ceph MDS we can see increased strays count:
ceph daemon mds.c perf dump | grep num_strays
"num_strays": 1,
"num_strays_delayed": 0,
"num_strays_enqueuing": 0,
Usual stray objects are purged after 10-20 secs. But not in this case. In this case stray object will be purget only after restart nfs-server1!
I can't see this behavior when i use two different native CephFS clients. It reproduced 100% only on ganesha instances.
I have seen this behavior on Ceph 12.2.12 and ganesha 2.7.1.
And i can see this behavior on Ceph 14.2.15 and ganesha 2.8.1.
I can see this on Ceph 14.2.15 and ganesha v3.5.1
ganesha.conf:
NFS_CORE_PARAM {
Enable_NLM = false;
Enable_RQUOTA = false;
Protocols = 4;
NFS_Port = 2049;
MaxRPCSendBufferSize = 9437184;
MaxRPCRecvBufferSize = 9437184;
}
NFSv4 {
Minor_Versions = 0,1,2;
Grace_Period = 10;
Lease_Lifetime = 10;
}
CACHEINODE {
Dir_Chunk = 0;
Dir_Max = 1;
NParts = 1;
Cache_FDs = false;
Cache_Size = 1;
}
EXPORT_DEFAULTS {
Access_Type = RW;
Attr_Expiration_Time = 0;
Transports = TCP;
}
EXPORT {
Export_ID=100;
Path = /nfs;
Pseudo = /;
Squash = Root_Squash;
FSAL {
Name = CEPH;
User_Id = ngw.b;
Secret_Access_Key = "key";
}
}
LOG {
Components {
ALL = INFO;
}
}
mount params:
mount -t nfs -o "lookupcache=positive,vers=4.1,soft,timeo=250,retrans=2" nfs-server:/ /mnt
Updated by Patrick Donnelly about 3 years ago
Aleksandr Rudenko wrote:
Usual stray objects are purged after 10-20 secs. But not in this case. In this case stray object will be purget only after restart nfs-server1!
This is normal. NFS has the unlinked file in its cache which is keeping it pinned in the MDS stray directory.
Updated by Jeff Layton about 3 years ago
The strays behavior makes some sense, since we don't really do anything client-side to notify the application when there is a removal, and ganesha will hold on to a reference to an inode for a long time.
We should probably change libcephfs to call _schedule_ino_release_callback when it detects that the file has been unlinked remotely. I'm a little unclear on how best to achieve that though. Do we get any other notification of dentry removal aside from the inode's caps being recalled?