Bug #50083
closedCephFS file access issues using kernel driver: file overwritten with null bytes
0%
Description
Ceph cluster is running 14.2.9 (nautilus), a 3 node containerised cluster. 1 active MDS, 2 standby
Using ceph kernel driver on 5.11.11-1.el7.elrepo.x86_64 (also tested on 5.10.10-1.el7.elrepo.x86_64).
Since moving to the 5.11.11 and 5.11.10 kernels, we've noticed files on cephfs mounts are being overwritten with null bytes. Normal text files are full of "^@" instead of written content.
Additionally the metadata for these files isn't correct; the last modified time seems to be slow to update:
[root@svr02 albacore] /opt/dcl/deploy/log> echo test >> cmd.210331.log ; date Wed Mar 31 14:14:20 BST 2021 [root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log -rw-rw---- 1 dcmbox dcl 39012 Mar 31 14:03 cmd.210331.log [root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log -rw-rw---- 1 dcmbox dcl 39012 Mar 31 14:03 cmd.210331.log [root@svr02 albacore] /opt/dcl/deploy/log> echo test >> cmd.210331.log ; date Wed Mar 31 14:15:05 BST 2021 [root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log -rw-rw---- 1 dcmbox dcl 39017 Mar 31 14:03 cmd.210331.log
We didn't experience these issues when running on 5.8.10-1.el7.elrepo.x86_64 .
Ceph cluster is healthy:
[qs-admin@albacore_sc0 metaswitch]$ ceph -s ++ sudo docker ps --filter name=ceph-mon- -q ++ sudo docker exec d384020a8fc1 ceph cluster: id: e4e508a2-21fd-4495-9645-2a7ac1521481 health: HEALTH_OK services: mon: 3 daemons, quorum albacore_sc0,albacore_sc1,albacore_sc2 (age 11d) mgr: albacore_sc2(active, since 6d), standbys: albacore_sc0, albacore_sc1 mds: cephfs:1 {0=albacore_sc0=up:active} 2 up:standby osd: 3 osds: 3 up (since 11d), 3 in (since 3M) rgw: 6 daemons active (albacore_sc0.pubsub, albacore_sc0.rgw0, albacore_sc1.pubsub, albacore_sc1.rgw0, albacore_sc2.pubsub, albacore_sc2.rgw0) data: pools: 13 pools, 136 pgs objects: 1.25M objects, 76 GiB usage: 238 GiB used, 62 GiB / 300 GiB avail pgs: 136 active+clean io: client: 40 KiB/s rd, 26 KiB/s wr, 39 op/s rd, 37 op/s wr
We have two client machines, each with 21 cephfs mounts, so a total of 42 clients according to ceph.
Our mount config:
10.225.41.221,10.225.41.222,10.225.41.223:6789:/albacore/system/deploy on /opt/dcl/deploy type ceph (rw,noatime,name=albacore,secret=<hidden>,acl,wsize=32768,rsize=32768,_netdev)
No warnings or slow requests. No trace of hanging ops on the client or server.
No ops stuck in flight:
[qs-admin@albacore_sc0 ~]$ ceph daemon mds.albacore_sc0 dump_ops_in_flight ++ sudo docker ps --filter name=ceph-mon- -q ++ sudo docker exec d384020a8fc1 ceph { "ops": [], "num_ops": 0 }
No issues reported in dmesg on the client (attached)
Some evictions logged by MDS on the ceph servers (MDS output attached).
No obvious errors in MON logs, but frequent calls to _set_new_cache_sizes which I don't recall seeing before (MON output attached).
Files
Updated by Patrick Donnelly about 3 years ago
- Status changed from New to Triaged
- Assignee set to Jeff Layton
This has never been heard of before. The most likely cause is something in your setup, e.g. a rogue process (rsync) mucking with your files.
The mtime not getting updated is odd. Might be something broke recently. Jeff, can you check?
Updated by Jeff Layton about 3 years ago
This is the first I've heard of this as well. You mentioned seeing this on v5.11 kernels. Have you also seen it on earlier ones? Do you have a reliable way to reproduce the problem?
Updated by David Piper about 3 years ago
We do have a reliable repro currently yes. A file that we write to several times an hour shows clear corruption corresponding with us rebooting from a 5.8 kernel into the 5.10.10 one. We haven't tried any kernels in between, but tried 5.11.11-1.el7.elrepo.x86_64 in the hope this issue was resolved there - but we're still seeing the corruptions.
[root@svr01 albamons] /opt/dcl/deploy/log> last | grep reboot
reboot system boot 5.10.10-1.el7.el Wed Mar 17 10:31 - 10:19 (13+22:47)
reboot system boot 5.8.10-1.el7.elr Tue Mar 16 09:51 - 10:30 (1+00:38)
reboot system boot 5.8.10-1.el7.elr Fri Jan 15 18:07 - 10:30 (60+16:22)
Here's the file where we've got our reliable repro. Note the timestamp; files before Mar 17th were fine.
[root@svr01 albamons] /opt/dcl/deploy/log> ls cmd* | xargs file
...
cmd.210308.log: ASCII text
cmd.210309.log: ASCII text
cmd.210310.log: ASCII text
cmd.210311.log: ASCII text
cmd.210312.log: ASCII text
cmd.210316.log: ASCII text
cmd.210317.log: data
cmd.210318.log: data
cmd.210319.log: data
cmd.210320.log: data
cmd.210321.log: data
and in the contents of the first bad file, the corruption starts just as we're rebooting:
(snippet of cmd.210317.log)
17-Mar-2021 10:24:07 svr01:mngcraft.210317102352.22865.trc craft doing Reboot server svr02 (quiesce = N)
17-Mar-2021 10:24:07 svr01:mngcraft.210317102352.22865.trc calling svr01:craf^^
^
^
^
^
^
@^
svr01 and svr02 are the two client servers mounting the shared cephfs directory.
Updated by Jeff Layton about 3 years ago
Ok, so it is reproducible in your environment. The problem is that I'm unclear on what sort of I/O is being done here. Is there a way to boil this reproducer down to something I can run locally?
Updated by Jeff Layton about 3 years ago
More questions:
1) how large are these files (generally)?
2) at what point does the corruption start?
3) How far does it extend?
Updated by Jeff Layton about 3 years ago
David, ping? Are you able to answer the questions above?
Updated by David Piper about 3 years ago
Hi sorry for the delay.
We were fairly panicked about this so have moved our kit onto a different kernel that doesn't exhibit the same corruption. We've switched to 5.4.109-1.el7.elrepo.x86_64 from elrepo which seems to be work just fine - no signs of corruption yet.
I'm working on getting a system back onto one of the bad kernels (5.10.10-1.el7.el) and then should be able to try to reduce the repro down to something specific.
Updated by Jeff Layton over 2 years ago
- Status changed from Triaged to Resolved
This is probably the bug that was fixed by this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=827a746f405d25f79560c7868474aec5aee174e1
There is also a backport for earlier kernels that just applies to cephfs. Please reopen this if you can still reproduce the issue with a patched kernel.