Project

General

Profile

Bug #44813

Sendfile on cephfs result in 0 bytes data on other node

Added by Nicolas Gaston almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
fs/ceph
Target version:
% Done:

0%

Source:
Community (user)
Tags:
cephfs ceph.ko libceph.ko
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

I use sendfile function to write data on cephfs (eg : https://github.com/pijewski/sendfile-example/). I need to write data from file descriptor to file descriptor whitout copy any data into the user space.

On ceph node1 (mgr, mon osd, mds) file is 10 Mo large : OK

[root@node1 sendfile-example]# ls
10m-file Makefile README.md sendfile sendfile.c
[root@node1 sendfile-example]# ./sendfile 10m-file 10m-file.out $(( 10 * 1024 * 1024 ))
Sent 10240 KiB over sendfile(3EXT) of 10240 KiB requested
[root@node1 sendfile-example]# ll
total 20492
rw-r--r- 1 root root 10485760 Mar 27 11:24 10m-file
------x--- 1 root root 10485760 Mar 30 14:14 10m-file.out

On node2 (mgr, mon osd, mds) file is 0 bytesl large : KO

[root@node2sendfile-example]# ll
total 10252
rw-r--r- 1 root root 10485760 Mar 27 11:24 10m-file
------x--- 1 root root 0 Mar 30 14:14 10m-file.out

I used CentOS 7.7 last kernel update (3.10.X)

To force data update on ceph client i need to used touch on file.

It is reproductible on ceph 12.x, 14.x and 15.x.

To workaroud the problem I have installed kernel-lt from ELrepo which is in 4.4. It seem to work fine in these kernel version.

I suspect ceph.ko or libceph.ko to not work properly in kernel 3.10

1/ Is it possible to confirm the bug
2/ Is there any issue in roadmap to update ceph client on kernel 3.10 ? i can not use ELrepo Kernel 4.4 because it's a production environnement ...

Thanks for Help

Best Regards

0001-fs-ceph-mark-Fw-cap-dirty-after-splice-write.patch View (2.4 KB) Mikael Öhman, 04/15/2020 11:49 AM

History

#1 Updated by Nicolas Gaston almost 4 years ago

possible same root cause in https://tracker.ceph.com/issues/39938

#2 Updated by Nicolas Gaston almost 4 years ago

Hi i have juste try the same issue on centos 8.1 and the bug is solved on kernel 4.18.

This occur only on ceph client in centos 7.X with 3.10 kernel.

I can't upgrade my production on centos 8 for the moment, is there any plan to correct the bug in kernel 3.10 ? centos 7.8 ? 7.9 ?

Thanck

Regards

#3 Updated by Mikael Öhman almost 4 years ago

Hi Nicolas,

I encountered this a while back, and I asked on the mailinglist. Luckily, Jeff Layton @ Redhat provided a patch that was slated to inclusion in RHEL 7.9.

I'm running a patched kernel 3.10.0-1062.18.1 kernel on CentOS 7.7 now, and it fixes the issue for my clusters.

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LCU4RDUNI6WDGJGPZGRSMOLYBJ6N2FNT/#OJ3IARRF3E76KP6C6BBWQNFNDB75Q4ER

To quote the patch:

This patch is RHEL-only becuase ceph_file_splice_write() is unique to
RHEL7 kernel. Upstream kernels uses new interface for splice write.

#4 Updated by Jeff Layton almost 4 years ago

  • Status changed from New to Resolved
  • Assignee set to Jeff Layton

Yes, should be fixed in latest RHEL release. See: https://bugzilla.redhat.com/show_bug.cgi?id=1710751

#5 Updated by Nicolas Gaston almost 4 years ago

Yes i have test it on centos7.8 CR it work fine.

Thanck you very much

Also available in: Atom PDF