Project

General

Profile

Actions

Bug #50083

closed

CephFS file access issues using kernel driver: file overwritten with null bytes

Added by David Piper about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph cluster is running 14.2.9 (nautilus), a 3 node containerised cluster. 1 active MDS, 2 standby
Using ceph kernel driver on 5.11.11-1.el7.elrepo.x86_64 (also tested on 5.10.10-1.el7.elrepo.x86_64).

Since moving to the 5.11.11 and 5.11.10 kernels, we've noticed files on cephfs mounts are being overwritten with null bytes. Normal text files are full of "^@" instead of written content.

Additionally the metadata for these files isn't correct; the last modified time seems to be slow to update:

[root@svr02 albacore] /opt/dcl/deploy/log> echo test >> cmd.210331.log ; date
Wed Mar 31 14:14:20 BST 2021
[root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log
-rw-rw---- 1 dcmbox dcl 39012 Mar 31 14:03 cmd.210331.log
[root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log
-rw-rw---- 1 dcmbox dcl 39012 Mar 31 14:03 cmd.210331.log
[root@svr02 albacore] /opt/dcl/deploy/log> echo test >> cmd.210331.log ; date
Wed Mar 31 14:15:05 BST 2021
[root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log
-rw-rw---- 1 dcmbox dcl 39017 Mar 31 14:03 cmd.210331.log

We didn't experience these issues when running on 5.8.10-1.el7.elrepo.x86_64 .

Ceph cluster is healthy:

[qs-admin@albacore_sc0 metaswitch]$ ceph -s
 ++ sudo docker ps --filter name=ceph-mon- -q
 ++ sudo docker exec d384020a8fc1 ceph
  cluster:
    id:     e4e508a2-21fd-4495-9645-2a7ac1521481
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum albacore_sc0,albacore_sc1,albacore_sc2 (age 11d)
    mgr: albacore_sc2(active, since 6d), standbys: albacore_sc0, albacore_sc1
    mds: cephfs:1 {0=albacore_sc0=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 11d), 3 in (since 3M)
    rgw: 6 daemons active (albacore_sc0.pubsub, albacore_sc0.rgw0, albacore_sc1.pubsub, albacore_sc1.rgw0, albacore_sc2.pubsub, albacore_sc2.rgw0)

  data:
    pools:   13 pools, 136 pgs
    objects: 1.25M objects, 76 GiB
    usage:   238 GiB used, 62 GiB / 300 GiB avail
    pgs:     136 active+clean

  io:
    client:   40 KiB/s rd, 26 KiB/s wr, 39 op/s rd, 37 op/s wr

We have two client machines, each with 21 cephfs mounts, so a total of 42 clients according to ceph.

Our mount config:

10.225.41.221,10.225.41.222,10.225.41.223:6789:/albacore/system/deploy on /opt/dcl/deploy type ceph (rw,noatime,name=albacore,secret=<hidden>,acl,wsize=32768,rsize=32768,_netdev)

No warnings or slow requests. No trace of hanging ops on the client or server.
No ops stuck in flight:

[qs-admin@albacore_sc0 ~]$ ceph daemon mds.albacore_sc0 dump_ops_in_flight
 ++ sudo docker ps --filter name=ceph-mon- -q
 ++ sudo docker exec d384020a8fc1 ceph
{
    "ops": [],
    "num_ops": 0
}

No issues reported in dmesg on the client (attached)
Some evictions logged by MDS on the ceph servers (MDS output attached).
No obvious errors in MON logs, but frequent calls to _set_new_cache_sizes which I don't recall seeing before (MON output attached).


Files

ceph-mds-albacore_sc0.log (5.04 KB) ceph-mds-albacore_sc0.log Logs from ceph MDS daemon (containerised) David Piper, 03/31/2021 01:56 PM
ceph_clients.txt (69.6 KB) ceph_clients.txt output of ceph tell mds.0 client ls David Piper, 03/31/2021 01:56 PM
dmesg.txt (46.2 KB) dmesg.txt dmesg output from client David Piper, 03/31/2021 01:56 PM
ceph-mon-albacore_sc0.log (920 KB) ceph-mon-albacore_sc0.log Logs from ceph MON daemon (containerised) David Piper, 03/31/2021 01:56 PM
Actions #1

Updated by Patrick Donnelly about 3 years ago

  • Description updated (diff)
Actions #2

Updated by Patrick Donnelly about 3 years ago

  • Status changed from New to Triaged
  • Assignee set to Jeff Layton

This has never been heard of before. The most likely cause is something in your setup, e.g. a rogue process (rsync) mucking with your files.

The mtime not getting updated is odd. Might be something broke recently. Jeff, can you check?

Actions #3

Updated by Jeff Layton about 3 years ago

This is the first I've heard of this as well. You mentioned seeing this on v5.11 kernels. Have you also seen it on earlier ones? Do you have a reliable way to reproduce the problem?

Actions #4

Updated by David Piper about 3 years ago

We do have a reliable repro currently yes. A file that we write to several times an hour shows clear corruption corresponding with us rebooting from a 5.8 kernel into the 5.10.10 one. We haven't tried any kernels in between, but tried 5.11.11-1.el7.elrepo.x86_64 in the hope this issue was resolved there - but we're still seeing the corruptions.

[root@svr01 albamons] /opt/dcl/deploy/log> last | grep reboot
reboot system boot 5.10.10-1.el7.el Wed Mar 17 10:31 - 10:19 (13+22:47)
reboot system boot 5.8.10-1.el7.elr Tue Mar 16 09:51 - 10:30 (1+00:38)
reboot system boot 5.8.10-1.el7.elr Fri Jan 15 18:07 - 10:30 (60+16:22)

Here's the file where we've got our reliable repro. Note the timestamp; files before Mar 17th were fine.

[root@svr01 albamons] /opt/dcl/deploy/log> ls cmd* | xargs file
...
cmd.210308.log: ASCII text
cmd.210309.log: ASCII text
cmd.210310.log: ASCII text
cmd.210311.log: ASCII text
cmd.210312.log: ASCII text
cmd.210316.log: ASCII text
cmd.210317.log: data
cmd.210318.log: data
cmd.210319.log: data
cmd.210320.log: data
cmd.210321.log: data

and in the contents of the first bad file, the corruption starts just as we're rebooting:

(snippet of cmd.210317.log)
17-Mar-2021 10:24:07 svr01:mngcraft.210317102352.22865.trc craft doing Reboot server svr02 (quiesce = N)
17-Mar-2021 10:24:07 svr01:mngcraft.210317102352.22865.trc calling svr01:craf^^^^^^^@^

svr01 and svr02 are the two client servers mounting the shared cephfs directory.

Actions #5

Updated by Jeff Layton about 3 years ago

Ok, so it is reproducible in your environment. The problem is that I'm unclear on what sort of I/O is being done here. Is there a way to boil this reproducer down to something I can run locally?

Actions #6

Updated by Jeff Layton about 3 years ago

More questions:

1) how large are these files (generally)?
2) at what point does the corruption start?
3) How far does it extend?

Actions #7

Updated by Jeff Layton about 3 years ago

David, ping? Are you able to answer the questions above?

Actions #8

Updated by David Piper about 3 years ago

Hi sorry for the delay.

We were fairly panicked about this so have moved our kit onto a different kernel that doesn't exhibit the same corruption. We've switched to 5.4.109-1.el7.elrepo.x86_64 from elrepo which seems to be work just fine - no signs of corruption yet.

I'm working on getting a system back onto one of the bad kernels (5.10.10-1.el7.el) and then should be able to try to reduce the repro down to something specific.

Actions #9

Updated by Jeff Layton over 2 years ago

  • Status changed from Triaged to Resolved

This is probably the bug that was fixed by this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=827a746f405d25f79560c7868474aec5aee174e1

There is also a backport for earlier kernels that just applies to cephfs. Please reopen this if you can still reproduce the issue with a patched kernel.

Actions

Also available in: Atom PDF