Bug #15722: CephFS on BlueStore - Metadata corruption when writing files - Ceph - Ceph

Actions

Copy link

Bug #15722

closed

CephFS on BlueStore - Metadata corruption when writing files

Added by Anonymous almost 8 years ago. Updated about 7 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

other

Tags:

BlueStore

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

v10.2.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

OS: CentOS 7.2
Kernel: 3.10.0-327.13.1.el7.x86_64
Release: Jewel 10.2.0
Setup: one node, baremetal, 1 mon, 1 mds, 3 osd on bluestore
Pools: 1 - cephfs_data, 2 - cephfs_metadata (2 replicas)

Short: On fresh CentOS writing a bunch of files to CephFS mounted with ceph-fuse immediately causes object corruption in metadata pool.

Notes:
HDDs used were tested with SMART and badblocks and are OK.
On the same setup RBD images' corruption has been observed, so this bug is probably not CephFS specific!

Steps to reproduce:
0. Bootstrap fresh Ceph cluster
Running cephfs-journal-tool journal export backup.bin successful, deep scrub on all OSDs passes
1. Mount CephFS on the same node: ceph-fuse /mnt
2. Copy a bunch of files, like: cp -r /usr /mnt
3. Corrupted!

Observed results:
- Running cephfs-journal-tool journal export backup.bin results in output:

7f6e3425fbc0 -1 Missing object 200.00000012
7f6e3425fbc0 -1 Missing object 200.00000013
7f6e3425fbc0 -1 Missing object 200.00000014
7f6e3425fbc0 -1 Missing object 200.00000015
7f6e3425fbc0 -1 Missing object 200.00000016
7f6e3425fbc0 -1 Missing object 200.00000017
7f6e3425fbc0 -1 Missing object 200.00000018
7f6e3425fbc0 -1 Missing object 200.00000019
7f6e3425fbc0 -1 Missing object 200.0000001a
7f6e3425fbc0 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados`
Error ((5) Input/output error)

at the same time in log:

7f86f85bd700 [ERR] : 2.38 full-object read crc 0xa0b54044 != expected 0xc894d150 on 2:1cc859d9:::200.00000012:head
7fc6512bc700 [ERR] : 2.e full-object read crc 0x8a402939 != expected 0x82852c6a on 2:715f5c78:::200.00000013:head
7f86f9dc0700 [ERR] : 2.1c full-object read crc 0x7332d275 != expected 0xe10809ca on 2:3939b497:::200.00000014:head
7f31bb129700 [ERR] : 2.3d full-object read crc 0x28e97282 != expected 0x28db137a on 2:bec9163d:::200.00000015:head
7f86f8dbe700 [ERR] : 2.f full-object read crc 0x2143a94a != expected 0x67b17290 on 2:f376ae95:::200.00000016:head
7f31bd12d700 [ERR] : 2.2a full-object read crc 0xde994f2a != expected 0x377451c7 on 2:545c15c7:::200.00000017:head
7fc652abf700 [ERR] : 2.1a full-object read crc 0x91db0268 != expected 0x3a6ef3fc on 2:590b566d:::200.00000018:head
7f86f85bd700 [ERR] : 2.b full-object read crc 0x82f2070d != expected 0x45eb7732 on 2:d2a48755:::200.00000019:head
7f86fa5c1700 [ERR] : 2.34 full-object read crc 0x7f9a6ab0 != expected 0x5b93080c on 2:2e600548:::200.0000001a:head

- Running deep scrub results in such messages in log:

7f86fadc2700 [ERR] : 2.38 shard 0: soid 2:1cc859d9:::200.00000012:head data_digest 0xa0b54044 != best guess data_digest 0x4b3761cf from auth shard 1
7f86fadc2700 [ERR] : 2.38 deep-scrub 0 missing, 1 inconsistent objects
7f86fadc2700 [ERR] : 2.38 deep-scrub 1 errors
......

- Deep scrub showing 10 PGs inconsistent, all from metadata pool
- Only running ceph pg repair on each group for 3 times repaired the journal and it is again possible to export it. Repairing each time resulted in a different log message:

2.34 repair 1 errors, 1 fixed
2.34 repair 1 errors, 0 fixed
2.34 repair ok, 0 fixed

- MDS does not start again (even though deep scrub is ok) writing to the log each time:

7f56643b7700  1 mds.0.7  recovery set is 
7f565f1ab700  0 mds.0.cache creating system inode with ino:100
7f565f1ab700  0 mds.0.cache creating system inode with ino:1
7f565dca4700 -1 mds.0.journaler(ro) try_read_entry: decode error from _is_readable
7f565dca4700  0 mds.0.log _replay journaler got error -22, aborting
7f565dca4700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument

Files

Download all files