Feature #12334
closednfs-ganesha: handle client cache pressure in NFS Ganesha FSAL
Added by John Spray almost 9 years ago. Updated almost 4 years ago.
0%
Description
Reported by Eric Eastman on ceph-users: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003000.html
When writing a number of files greater than the MDS cache size via an NFS ganesha client, user sees "Client <foo> failing to respond to cache pressure" warnings.
Presumably this is due to the NFS layer taking references to inodes using the ll interface to libcephfs, and not also having a hook to be kicked to release cached inodes in response to cache pressure.
Updated by Greg Farnum almost 9 years ago
Note that NFS-Ganesha created significantly more inodes than the cache size limit before it had too many pinned. So it is dropping some caps. But we just don't have any feedback mechanism.
NFS file handles are probably the right way to deal with this; we ought to be able to drop basically whatever we want if the only user is an NFS client...
Updated by Eric Eastman almost 9 years ago
I ran the same 5 million file create test using a cifs mount instead of a NFS mount and did not see the "Client <foo> failing to respond to cache pressure" warning. Like with the NFS test, the file creates were split between 2 clients. Both clients used the cifs mount type. The Ubuntu trusty client used SMB 3 with the fstab entry:
//ede-c2-gw01/cephfs /TEST-SMB cifs rw,guest,noauto,vers=3.0
Centos 6.6 used SMB 1 with the fstab entry:
//ede-c2-gw01/cephfs /TEST-SMB cifs rw,guest,noauto
Like the NFS test, there was the single gateway using the Ceph file system VFS interface to SAMBA. SAMBA Version: 4.3.0pre1-GIT-2c1c567. The smb.conf entry for the file system was:
[cephfs]
path = /
writeable = yes
vfs objects = ceph
ceph:config_file = /etc/ceph/ceph.conf
browseable = yes
Next test will be with the kernel ceph file system interface.
Updated by Eric Eastman almost 9 years ago
I am seeing the "Client <foo> failing to respond to cache pressure" warning using the Ceph Kernel driver after creating the 5 million files. This was using only 1 Ceph file systems client.
Client and all members of the Ceph cluster are running Ubuntu Trusty with 4.1 kernel and Ceph v9.0.1. Info from the client:
# lsmod | grep ceph ceph 323584 1 libceph 253952 1 ceph libcrc32c 16384 1 libceph fscache 65536 1 ceph # cat /etc/fstab | grep ceph 10.15.2.121,10.15.2.122,10.15.2.123:/ /cephfs ceph name=cephfs,secretfile=/etc/ceph/client.cephfs,noatime,noauto,_netdev # ceph --version ceph version 9.0.1 (997b3f998d565a744bfefaaf34b08b891f8dbf64) # cat /proc/version Linux version 4.1.0-040100-generic (kernel@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201506220235 SMP Mon Jun 22 06:36:19 UTC 2015
This ceph -s was taken about 6 hours after the file creation tool finished, and the system was idle for these 6 hours:
# ceph -s cluster 6d8aae1e-1125-11e5-a708-001b78e265be health HEALTH_WARN mds0: Client ede-c2-gw01:cephfs failing to respond to cache pressure monmap e1: 3 mons at {ede-c2-mon01=10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0} election epoch 22, quorum 0,1,2 ede-c2-mon01,ede-c2-mon02,ede-c2-mon03 mdsmap e3512: 1/1/1 up {0=ede-c2-mds03=up:active}, 2 up:standby osdmap e590: 8 osds: 8 up, 8 in pgmap v473095: 832 pgs, 4 pools, 162 GB data, 4312 kobjects 182 GB used, 78319 MB / 263 GB avail 832 active+clean
From the active MDS:
# ceph daemon mds.ede-c2-mds03 perf dump mds { "mds": { "request": 734724, "reply": 734723, "reply_latency": { "avgcount": 734723, "sum": 2207.364066876 }, "forward": 0, "dir_fetch": 13, "dir_commit": 22483, "dir_split": 0, "inode_max": 100000, "inodes": 197476, "inodes_top": 0, "inodes_bottom": 0, "inodes_pin_tail": 197476, "inodes_pinned": 197476, "inodes_expired": 746886, "inodes_with_caps": 192730, "caps": 192730, "subtrees": 2, "traverse": 1476741, "traverse_hit": 742325, "traverse_forward": 0, "traverse_discover": 0, "traverse_dir_fetch": 1, "traverse_remote_ino": 0, "traverse_lock": 0, "load_cent": 73472400, "q": 0, "exported": 0, "exported_inodes": 0, "imported": 0, "imported_inodes": 0 } }
Updated by Eric Eastman almost 9 years ago
I finished the final test using 1 Ceph file system client and the fuse interface. I ran the create 5 million file test a couple times mounting the ceph file system using the fuse interface without seeing the "Client <foo> failing to respond to cache pressure" warning. The fstab entry used:
# cat /etc/fstab | grep ceph id=cephfs,keyring=/etc/ceph/client.cephfs.keyring /cephfs fuse.ceph noatime,_netdev,noauto 0 0
Summary of the tests:
I saw the cache pressure warning with Ganesha NFS and the Ceph file system kernel interface. I did not see the warning with the SAMBA or the Ceph fuse.ceph interface.
Let me know if you need additional information.
Updated by John Spray over 8 years ago
It looks like the ganesha FSAL interface already includes the function `up_async_invalidate` for this sort of thing, though libcephfs itself doesn't currently provide a way to register such a hook. We'll need to update the FSAL and libcephfs.
Updated by John Spray over 8 years ago
Pinged Matt & Adam about this yesterday, Matt's planning to work on it at some stage. In some cases we may want to simply disable the ganesha cache in favour of Client()'s cache, but it should also be possible to implement these invalidation hooks.
Updated by Patrick Donnelly about 6 years ago
- Subject changed from Handle client cache pressure in NFS Ganesha FSAL to nfs-ganesha: handle client cache pressure in NFS Ganesha FSAL
- Category changed from NFS (Linux Kernel) to Correctness/Safety
- Assignee set to Jeff Layton
- Priority changed from Normal to High
- Target version set to v13.0.0
- Tags set to ganesha,intern
- Backport set to luminous
- Release set to luminous
- Component(FS) Client, Ganesha FSAL added
Updated by Jeff Layton about 6 years ago
This is worth looking into, but I wonder how big of a problem this is once you disable a lot of the ganesha mdcache behavior.
The one thing we can't really disable in ganesha though is the filehandle cache, and each of those holds an Inode reference. We will probably need some mechanism to force some of those to be flushed out, but there is a bit of a problem:
ganesha may not always be able to tear down a specific filehandle if it's still in use, and libcephfs doesn't have any way to know which ones are. What may be best is just some mechanism to ask ganesha to clean out whatever filehandles are not still in use or maybe just ask it to shrink the cache by some number of entries.
Updated by Jeff Layton about 6 years ago
Possibly related tracker:
Updated by Patrick Donnelly about 6 years ago
- Related to Feature #18537: libcephfs cache invalidation upcalls added
Updated by Patrick Donnelly almost 6 years ago
- Target version changed from v13.0.0 to v14.0.0
- Tags deleted (
ganesha,intern) - Backport changed from luminous to mimic,luminous
- Labels (FS) task(intern) added
Updated by Jeff Layton about 5 years ago
- Status changed from New to Rejected
I've not heard of anyone hitting this that has set up ganesha to use the CACHEINODE and EXPORT parameters recommended here:
https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
With that, I don't see a real need for this. I'm going to close this with a resolution of REJECTED for now, but feel free to reopen if you have a use-case you'd like considered for this.
Updated by Zoltan Arnold Nagy over 4 years ago
I do see this on a new mds setup, with 14.2.4, having the right ganesha setup:
root@c10n5:~# cat /etc/ganesha/ganesha.conf | grep -i 'cache_size|expiration_time'
Cache_Size = 1;
root@c10n5:~# cat /etc/ganesha/ganesha.conf | grep -i Attr_Expiration_Time
Attr_Expiration_Time = 0;
root@c10n5:~#
and yet ceph -s tells me the same thing:
1 MDSs report oversized cache
1 clients failing to respond to cache pressure
I don't have anything else using this other than ganesha.
What info can I provide?
Updated by Jeff Layton over 4 years ago
Zoltan Arnold Nagy wrote:
What info can I provide?
I think it'd be best to open a new tracker ticket for the problem you're having. This one is all about the ceph client failing to respond to cache pressure, which may or may not be the case here.
Be sure to save the MDS logfile, in particular any "MDS cache is too large" messages, which should tell us something about which caches are too large. We'd also need to know something about the workload on the NFS clients. Are they holding a lot of files open?
Updated by Janek Bevendorff over 4 years ago
I am seeing similar issues on our cluster. I had the Ganesha node running on the same node as the MONs just for convenience, so you could use the same domain to connect to, but found that it puts too much load on those nodes. Therefore, I moved the Ganesha services to different nodes and the "failing to respond to cache pressure" immediately moved to the new node as well.
Right now, I only have one (!) mostly idle client with two shares connected. The MDS reports 3653 caps allocated, but apparently that's enough to show the warning. When I restart the Ganesha server, the message goes away, but it only takes about 2-5 minutes and a tiny backup task for it to reappear.
Updated by Jeff Layton about 4 years ago
- Status changed from Rejected to In Progress
- Target version changed from v14.0.0 to v16.0.0
Reopening this bug. We've had some other reports of this upstream as well, and I'm convinced we'll need to add some way to relay cache pressure to ganesha. There are couple of issues here:
1) libcephfs doesn't provide an interface to set callbacks for this (see tracker #45114). That's fairly simple to solve.
2) libcephfs doesn't have a mechanism to scan the inode_map and ask the application to release inode refs. We currently scan for dentries and release them in trim_cache(), but ganesha looks up inodes by vinodeno_t. There may be no dentries associated with them. We may be able to just walk the inode_map after trimming dentries and ask the application to release what it can, or we may need to add a separate LRU for Inodes.
3) ganesha doesn't have a mechanism to allow libcephfs to request that it release an Inode reference, if it's able. It does have a facility for "upcalls", but it doesn't have one for this. That will need to be added.
Updated by Jeff Layton about 4 years ago
Ok, I have a first stab at the ceph piece of this mostly done now. The ganesha piece still needs some work as it's not trivial to atomically check whether an entry (aka inode) has files open and only unhash and decrement it if it does. Hopefully I'll have something ready for testing soon though.
Updated by Jeff Layton about 4 years ago
Ok, ceph patches are pretty much done. Just waiting on review so I can merge them. There are also some ganesha patches to make it use the new functionality, culminating in this patch:
Updated by Jeff Layton almost 4 years ago
Ganesha patches are merged and have been for over a week. The libcephfs bits are also still ready, but testing is taking a lot longer than expected.
Updated by Jeff Layton almost 4 years ago
- Status changed from In Progress to Resolved
Ceph patches were merged.
Updated by Nathan Cutler almost 4 years ago
- Backport deleted (
mimic,luminous)
Since target version is set to 16.0.0 and the status was changed to "Resolved", I guess backports are not needed (?)
Updated by Patrick Donnelly almost 4 years ago
- Status changed from Resolved to Pending Backport
- Backport set to octopus,nautilus
No, the backport release list just needs updated.
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45688: octopus: nfs-ganesha: handle client cache pressure in NFS Ganesha FSAL added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45689: nautilus: nfs-ganesha: handle client cache pressure in NFS Ganesha FSAL added
Updated by Patrick Donnelly almost 4 years ago
- Has duplicate Bug #45114: client: make cache shrinking callbacks available via libcephfs added
Updated by Patrick Donnelly almost 4 years ago
- Related to Bug #44976: MDS problem slow requests, cache pressure, damaged metadata after upgrading 14.2.7 to 14.2.8 added
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".