Bug #50083: CephFS file access issues using kernel driver: file overwritten with null bytes - CephFS - Ceph

Actions

Copy link

Bug #50083

closed

CephFS file access issues using kernel driver: file overwritten with null bytes

Added by David Piper about 3 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Jeff Layton

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Ceph cluster is running 14.2.9 (nautilus), a 3 node containerised cluster. 1 active MDS, 2 standby
Using ceph kernel driver on 5.11.11-1.el7.elrepo.x86_64 (also tested on 5.10.10-1.el7.elrepo.x86_64).

Since moving to the 5.11.11 and 5.11.10 kernels, we've noticed files on cephfs mounts are being overwritten with null bytes. Normal text files are full of "^@" instead of written content.

Additionally the metadata for these files isn't correct; the last modified time seems to be slow to update:

[root@svr02 albacore] /opt/dcl/deploy/log> echo test >> cmd.210331.log ; date
Wed Mar 31 14:14:20 BST 2021
[root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log
-rw-rw---- 1 dcmbox dcl 39012 Mar 31 14:03 cmd.210331.log
[root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log
-rw-rw---- 1 dcmbox dcl 39012 Mar 31 14:03 cmd.210331.log
[root@svr02 albacore] /opt/dcl/deploy/log> echo test >> cmd.210331.log ; date
Wed Mar 31 14:15:05 BST 2021
[root@svr02 albacore] /opt/dcl/deploy/log> ls -ltr cmd.210331.log
-rw-rw---- 1 dcmbox dcl 39017 Mar 31 14:03 cmd.210331.log

We didn't experience these issues when running on 5.8.10-1.el7.elrepo.x86_64 .

Ceph cluster is healthy:

[qs-admin@albacore_sc0 metaswitch]$ ceph -s
 ++ sudo docker ps --filter name=ceph-mon- -q
 ++ sudo docker exec d384020a8fc1 ceph
  cluster:
    id:     e4e508a2-21fd-4495-9645-2a7ac1521481
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum albacore_sc0,albacore_sc1,albacore_sc2 (age 11d)
    mgr: albacore_sc2(active, since 6d), standbys: albacore_sc0, albacore_sc1
    mds: cephfs:1 {0=albacore_sc0=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 11d), 3 in (since 3M)
    rgw: 6 daemons active (albacore_sc0.pubsub, albacore_sc0.rgw0, albacore_sc1.pubsub, albacore_sc1.rgw0, albacore_sc2.pubsub, albacore_sc2.rgw0)

  data:
    pools:   13 pools, 136 pgs
    objects: 1.25M objects, 76 GiB
    usage:   238 GiB used, 62 GiB / 300 GiB avail
    pgs:     136 active+clean

  io:
    client:   40 KiB/s rd, 26 KiB/s wr, 39 op/s rd, 37 op/s wr

We have two client machines, each with 21 cephfs mounts, so a total of 42 clients according to ceph.

Our mount config:

10.225.41.221,10.225.41.222,10.225.41.223:6789:/albacore/system/deploy on /opt/dcl/deploy type ceph (rw,noatime,name=albacore,secret=<hidden>,acl,wsize=32768,rsize=32768,_netdev)

No warnings or slow requests. No trace of hanging ops on the client or server.
No ops stuck in flight:

[qs-admin@albacore_sc0 ~]$ ceph daemon mds.albacore_sc0 dump_ops_in_flight
 ++ sudo docker ps --filter name=ceph-mon- -q
 ++ sudo docker exec d384020a8fc1 ceph
{
    "ops": [],
    "num_ops": 0
}

No issues reported in dmesg on the client (attached)
Some evictions logged by MDS on the ceph servers (MDS output attached).
No obvious errors in MON logs, but frequent calls to _set_new_cache_sizes which I don't recall seeing before (MON output attached).

Files

Download all files

ceph-mds-albacore_sc0.log (5.04 KB) ceph-mds-albacore_sc0.log	Logs from ceph MDS daemon (containerised)	David Piper, 03/31/2021 01:56 PM
ceph_clients.txt (69.6 KB) ceph_clients.txt	output of ceph tell mds.0 client ls	David Piper, 03/31/2021 01:56 PM
dmesg.txt (46.2 KB) dmesg.txt	dmesg output from client	David Piper, 03/31/2021 01:56 PM
ceph-mon-albacore_sc0.log (920 KB) ceph-mon-albacore_sc0.log	Logs from ceph MON daemon (containerised)	David Piper, 03/31/2021 01:56 PM

Actions

Copy link

Updated by Patrick Donnelly about 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Patrick Donnelly about 3 years ago

Status changed from New to Triaged
Assignee set to Jeff Layton

This has never been heard of before. The most likely cause is something in your setup, e.g. a rogue process (rsync) mucking with your files.

The mtime not getting updated is odd. Might be something broke recently. Jeff, can you check?

Actions

Copy link

Updated by Jeff Layton about 3 years ago

This is the first I've heard of this as well. You mentioned seeing this on v5.11 kernels. Have you also seen it on earlier ones? Do you have a reliable way to reproduce the problem?

Actions

Copy link

Updated by David Piper about 3 years ago

We do have a reliable repro currently yes. A file that we write to several times an hour shows clear corruption corresponding with us rebooting from a 5.8 kernel into the 5.10.10 one. We haven't tried any kernels in between, but tried 5.11.11-1.el7.elrepo.x86_64 in the hope this issue was resolved there - but we're still seeing the corruptions.

[root@svr01 albamons] /opt/dcl/deploy/log> last | grep reboot
reboot system boot 5.10.10-1.el7.el Wed Mar 17 10:31 - 10:19 (13+22:47)
reboot system boot 5.8.10-1.el7.elr Tue Mar 16 09:51 - 10:30 (1+00:38)
reboot system boot 5.8.10-1.el7.elr Fri Jan 15 18:07 - 10:30 (60+16:22)

Here's the file where we've got our reliable repro. Note the timestamp; files before Mar 17th were fine.

[root@svr01 albamons] /opt/dcl/deploy/log> ls cmd* | xargs file
...
cmd.210308.log: ASCII text
cmd.210309.log: ASCII text
cmd.210310.log: ASCII text
cmd.210311.log: ASCII text
cmd.210312.log: ASCII text
cmd.210316.log: ASCII text
cmd.210317.log: data
cmd.210318.log: data
cmd.210319.log: data
cmd.210320.log: data
cmd.210321.log: data

and in the contents of the first bad file, the corruption starts just as we're rebooting:

(snippet of cmd.210317.log)
17-Mar-2021 10:24:07 svr01:mngcraft.210317102352.22865.trc craft doing Reboot server svr02 (quiesce = N)
17-Mar-2021 10:24:07 svr01:mngcraft.210317102352.22865.trc calling svr01:craf^^^{^}^^{^}^^{^}@^

svr01 and svr02 are the two client servers mounting the shared cephfs directory.

Actions

Copy link

Updated by Jeff Layton about 3 years ago

Ok, so it is reproducible in your environment. The problem is that I'm unclear on what sort of I/O is being done here. Is there a way to boil this reproducer down to something I can run locally?

Actions

Copy link

Updated by Jeff Layton about 3 years ago

David, ping? Are you able to answer the questions above?

Actions

Copy link

Updated by David Piper about 3 years ago

Hi sorry for the delay.

We were fairly panicked about this so have moved our kit onto a different kernel that doesn't exhibit the same corruption. We've switched to 5.4.109-1.el7.elrepo.x86_64 from elrepo which seems to be work just fine - no signs of corruption yet.

I'm working on getting a system back onto one of the bad kernels (5.10.10-1.el7.el) and then should be able to try to reduce the repro down to something specific.

Actions

Copy link

Updated by Jeff Layton over 2 years ago

Status changed from Triaged to Resolved

This is probably the bug that was fixed by this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=827a746f405d25f79560c7868474aec5aee174e1

There is also a backport for earlier kernels that just applies to cephfs. Please reopen this if you can still reproduce the issue with a patched kernel.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #50083

CephFS file access issues using kernel driver: file overwritten with null bytes

Updated by Patrick Donnelly about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by David Piper about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by David Piper about 3 years ago

Updated by Jeff Layton over 2 years ago