Bug #49826
openMultiple nfs-ganesha instances and strays objects in CephFS
0%
Description
Hi!
We have one CephFS and two standalone ganesha instances on different hosts which export the same directory.
We don't use any HA solution for ganesha and we don't need any HA.
Our nfs-clients are centos 7.5. Clients can mount any of our ganesha servers (DNS round-robin). Clients can't work with same file in same time. Clients don't overwrite files and we can't have any race. Everything works fine, but we have problem with file removing.
Problem:
mount nfs-server1
create file1
umount nfs-server1
---
mount nfs-server2
rm -rf file1
umount nfs-server2
After this we can't see file1 in directory any more.
But, on active Ceph MDS we can see increased strays count:
ceph daemon mds.c perf dump | grep num_strays
"num_strays": 1,
"num_strays_delayed": 0,
"num_strays_enqueuing": 0,
Usual stray objects are purged after 10-20 secs. But not in this case. In this case stray object will be purget only after restart nfs-server1!
I can't see this behavior when i use two different native CephFS clients. It reproduced 100% only on ganesha instances.
I have seen this behavior on Ceph 12.2.12 and ganesha 2.7.1.
And i can see this behavior on Ceph 14.2.15 and ganesha 2.8.1.
I can see this on Ceph 14.2.15 and ganesha v3.5.1
ganesha.conf:
NFS_CORE_PARAM {
Enable_NLM = false;
Enable_RQUOTA = false;
Protocols = 4;
NFS_Port = 2049;
MaxRPCSendBufferSize = 9437184;
MaxRPCRecvBufferSize = 9437184;
}
NFSv4 {
Minor_Versions = 0,1,2;
Grace_Period = 10;
Lease_Lifetime = 10;
}
CACHEINODE {
Dir_Chunk = 0;
Dir_Max = 1;
NParts = 1;
Cache_FDs = false;
Cache_Size = 1;
}
EXPORT_DEFAULTS {
Access_Type = RW;
Attr_Expiration_Time = 0;
Transports = TCP;
}
EXPORT {
Export_ID=100;
Path = /nfs;
Pseudo = /;
Squash = Root_Squash;
FSAL {
Name = CEPH;
User_Id = ngw.b;
Secret_Access_Key = "key";
}
}
LOG {
Components {
ALL = INFO;
}
}
mount params:
mount -t nfs -o "lookupcache=positive,vers=4.1,soft,timeo=250,retrans=2" nfs-server:/ /mnt