Bug #44166
Client fails to release capabilities, lazy collective shared file I/O (simplified PoC provided).
0%
Description
Hello,
when doing collective shared file I/O on a file opened in Lazy mode, the clients seem to fail to release caps.
The following warning is visible in the MDS:
2020-02-17 10:01:02.848616 mds.hpc-be143 [WRN] client.78473537 isn't responding to mclientcaps(revoke), ino 0x100043ba7f7 pending pAsLsXsFrl issued pAsLsXsFcrl, sent 60.513818 seconds ago
2020-02-17 10:01:02.848641 mds.hpc-be143 [WRN] client.82122060 isn't responding to mclientcaps(revoke), ino 0x100043ba7f7 pending pAsLsXsFrl issued pAsLsXsFcrl, sent 60.513812 seconds ago
2020-02-17 10:01:06.956117 mon.cephjim-mon-decd03f337 [WRN] Health check failed: 2 clients failing to respond to capability release (MDS_CLIENT_LATE_RELEASE)
2020-02-17 10:01:15.366076 mon.cephjim-mon-decd03f337 [WRN] Health check update: 1 clients failing to respond to capability release (MDS_CLIENT_LATE_RELEASE)
Only after each client has unmounted and re-mounted, the warning is cleared and health is restored to HEALTH_OK.
This only seems to happen when 1) there is more than one client/host involved, and 2) Lazy I/O is activated on the open file handle.
Originally this was detected on a patched version of IOR which just includes a line with:
ioctl(fd, CEPH_IOC_LAZYIO);
An interesting remark is that this only happens when IOR is using the POSIX api. When running with the MPIIO interface, I have not managed to reproduce this issue.
Dynamic analysis (strace) reveals that POSIX does:
open()
ioctl(lazy..)
lseek()
write()
lseek()
write()
...
fsync()
close()
while MPIIO does:
open()
ioctl(lazy..)
pwritev()
pwritev()
...
fsync()
close()
I am attaching a PoC (lazycaps.c) which reproduces the POSIX I/O access pattern of IOR which successfully manages to reproduce this issue. As mentioned before, note that to reproduce, I had to use two clients on two different hosts. Usage of the PoC is as follows:
./lazycaps <filename> <rank>
Where <filename> is self-explanatory, and <rank> should be 0 on one client and 1 on the other client (resembles MPI ranks). You can either try to run both semi-simultaneusly manually, or using mpirun.
Compile with:
gcc -o lazycaps lazycaps.c
Other information about the system:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.7.1908 (Core)
Release: 7.7.1908
Codename: Core
kernel: 3.10.0-1062.1.1.el7.x86_64
ceph on mon, mgr, mds, osds: ceph-14.2.6-0.el7.x86_64
Let me know if you need any other information.
Thanks!
Pablo
History
#1 Updated by Pablo Llopis almost 4 years ago
I can reproduce the same issue on the following kernel:
kernel-ml-5.5.4-1.el7.elrepo.x86_64
#2 Updated by Pablo Llopis about 2 years ago
I re-evaluated this on clients running 3.10.0-1160.36.2.el7.x86_64 (CentOS 7) and 4.18.0-348.el8.x86_64 (CentOS 8), against a CephFS cluster running CentOS 7 and ceph version 14.2.22, and could no longer reproduce this bug.
FWIW, in case anybody wants to evaluate Lazy I/O, this is how I have been doing it:
- Using libcephfs I found it's convenient to use IOR, which has a module that talks libcephfs directly when using the option `-a CEPHFS`.
- Via kernel mounts, it's also easy to evaluate any application without having to patch it by using LD_PRELOAD together with this library: https://gitlab.cern.ch/batch-team/lazyio
Cheers