Project

General

Profile

Feature #12334

nfs-ganesha: handle client cache pressure in NFS Ganesha FSAL

Added by John Spray over 3 years ago. Updated 7 months ago.

Status:
New
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
Start date:
07/15/2015
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Reviewed:
Affected Versions:
Component(FS):
Client, Ganesha FSAL
Labels (FS):
task(intern)
Pull request ID:

Description

Reported by Eric Eastman on ceph-users: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003000.html

When writing a number of files greater than the MDS cache size via an NFS ganesha client, user sees "Client <foo> failing to respond to cache pressure" warnings.

Presumably this is due to the NFS layer taking references to inodes using the ll interface to libcephfs, and not also having a hook to be kicked to release cached inodes in response to cache pressure.


Related issues

Related to fs - Feature #18537: libcephfs cache invalidation upcalls New 01/16/2017

History

#1 Updated by Greg Farnum over 3 years ago

Note that NFS-Ganesha created significantly more inodes than the cache size limit before it had too many pinned. So it is dropping some caps. But we just don't have any feedback mechanism.

NFS file handles are probably the right way to deal with this; we ought to be able to drop basically whatever we want if the only user is an NFS client...

#2 Updated by Eric Eastman over 3 years ago

I ran the same 5 million file create test using a cifs mount instead of a NFS mount and did not see the "Client <foo> failing to respond to cache pressure" warning. Like with the NFS test, the file creates were split between 2 clients. Both clients used the cifs mount type. The Ubuntu trusty client used SMB 3 with the fstab entry:
//ede-c2-gw01/cephfs /TEST-SMB cifs rw,guest,noauto,vers=3.0
Centos 6.6 used SMB 1 with the fstab entry:
//ede-c2-gw01/cephfs /TEST-SMB cifs rw,guest,noauto

Like the NFS test, there was the single gateway using the Ceph file system VFS interface to SAMBA. SAMBA Version: 4.3.0pre1-GIT-2c1c567. The smb.conf entry for the file system was:

[cephfs]
path = /
writeable = yes
vfs objects = ceph
ceph:config_file = /etc/ceph/ceph.conf
browseable = yes

Next test will be with the kernel ceph file system interface.

#3 Updated by Eric Eastman over 3 years ago

I am seeing the "Client <foo> failing to respond to cache pressure" warning using the Ceph Kernel driver after creating the 5 million files. This was using only 1 Ceph file systems client.

Client and all members of the Ceph cluster are running Ubuntu Trusty with 4.1 kernel and Ceph v9.0.1. Info from the client:

# lsmod | grep ceph
ceph                  323584  1 
libceph               253952  1 ceph
libcrc32c              16384  1 libceph
fscache                65536  1 ceph

# cat /etc/fstab | grep ceph
10.15.2.121,10.15.2.122,10.15.2.123:/ /cephfs ceph name=cephfs,secretfile=/etc/ceph/client.cephfs,noatime,noauto,_netdev

# ceph --version
ceph version 9.0.1 (997b3f998d565a744bfefaaf34b08b891f8dbf64)
# cat /proc/version 
Linux version 4.1.0-040100-generic (kernel@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201506220235 SMP Mon Jun 22 06:36:19 UTC 2015

This ceph -s was taken about 6 hours after the file creation tool finished, and the system was idle for these 6 hours:
# ceph -s
    cluster 6d8aae1e-1125-11e5-a708-001b78e265be
     health HEALTH_WARN
            mds0: Client ede-c2-gw01:cephfs failing to respond to cache pressure
     monmap e1: 3 mons at {ede-c2-mon01=10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0}
            election epoch 22, quorum 0,1,2 ede-c2-mon01,ede-c2-mon02,ede-c2-mon03
     mdsmap e3512: 1/1/1 up {0=ede-c2-mds03=up:active}, 2 up:standby
     osdmap e590: 8 osds: 8 up, 8 in
      pgmap v473095: 832 pgs, 4 pools, 162 GB data, 4312 kobjects
            182 GB used, 78319 MB / 263 GB avail
                 832 active+clean

From the active MDS:

# ceph daemon mds.ede-c2-mds03 perf dump mds
{
    "mds": {
        "request": 734724,
        "reply": 734723,
        "reply_latency": {
            "avgcount": 734723,
            "sum": 2207.364066876
        },
        "forward": 0,
        "dir_fetch": 13,
        "dir_commit": 22483,
        "dir_split": 0,
        "inode_max": 100000,
        "inodes": 197476,
        "inodes_top": 0,
        "inodes_bottom": 0,
        "inodes_pin_tail": 197476,
        "inodes_pinned": 197476,
        "inodes_expired": 746886,
        "inodes_with_caps": 192730,
        "caps": 192730,
        "subtrees": 2,
        "traverse": 1476741,
        "traverse_hit": 742325,
        "traverse_forward": 0,
        "traverse_discover": 0,
        "traverse_dir_fetch": 1,
        "traverse_remote_ino": 0,
        "traverse_lock": 0,
        "load_cent": 73472400,
        "q": 0,
        "exported": 0,
        "exported_inodes": 0,
        "imported": 0,
        "imported_inodes": 0
    }
}

#4 Updated by Eric Eastman over 3 years ago

I finished the final test using 1 Ceph file system client and the fuse interface. I ran the create 5 million file test a couple times mounting the ceph file system using the fuse interface without seeing the "Client <foo> failing to respond to cache pressure" warning. The fstab entry used:

# cat /etc/fstab | grep ceph
id=cephfs,keyring=/etc/ceph/client.cephfs.keyring /cephfs fuse.ceph noatime,_netdev,noauto 0 0

Summary of the tests:
I saw the cache pressure warning with Ganesha NFS and the Ceph file system kernel interface. I did not see the warning with the SAMBA or the Ceph fuse.ceph interface.

Let me know if you need additional information.

#5 Updated by John Spray about 3 years ago

It looks like the ganesha FSAL interface already includes the function `up_async_invalidate` for this sort of thing, though libcephfs itself doesn't currently provide a way to register such a hook. We'll need to update the FSAL and libcephfs.

#6 Updated by John Spray about 3 years ago

Pinged Matt & Adam about this yesterday, Matt's planning to work on it at some stage. In some cases we may want to simply disable the ganesha cache in favour of Client()'s cache, but it should also be possible to implement these invalidation hooks.

#7 Updated by Patrick Donnelly 9 months ago

  • Subject changed from Handle client cache pressure in NFS Ganesha FSAL to nfs-ganesha: handle client cache pressure in NFS Ganesha FSAL
  • Category changed from NFS (Linux Kernel) to Correctness/Safety
  • Assignee set to Jeff Layton
  • Priority changed from Normal to High
  • Target version set to v13.0.0
  • Tags set to ganesha,intern
  • Backport set to luminous
  • Release set to luminous
  • Component(FS) Client, Ganesha FSAL added

#8 Updated by Jeff Layton 9 months ago

This is worth looking into, but I wonder how big of a problem this is once you disable a lot of the ganesha mdcache behavior.

The one thing we can't really disable in ganesha though is the filehandle cache, and each of those holds an Inode reference. We will probably need some mechanism to force some of those to be flushed out, but there is a bit of a problem:

ganesha may not always be able to tear down a specific filehandle if it's still in use, and libcephfs doesn't have any way to know which ones are. What may be best is just some mechanism to ask ganesha to clean out whatever filehandles are not still in use or maybe just ask it to shrink the cache by some number of entries.

#9 Updated by Jeff Layton 9 months ago

Possibly related tracker:

http://tracker.ceph.com/issues/18537

#10 Updated by Patrick Donnelly 8 months ago

  • Related to Feature #18537: libcephfs cache invalidation upcalls added

#11 Updated by Patrick Donnelly 7 months ago

  • Target version changed from v13.0.0 to v14.0.0
  • Tags deleted (ganesha,intern)
  • Backport changed from luminous to mimic,luminous
  • Labels (FS) task(intern) added

Also available in: Atom PDF