Bug #15722
closedCephFS on BlueStore - Metadata corruption when writing files
0%
Description
OS: CentOS 7.2
Kernel: 3.10.0-327.13.1.el7.x86_64
Release: Jewel 10.2.0
Setup: one node, baremetal, 1 mon, 1 mds, 3 osd on bluestore
Pools: 1 - cephfs_data, 2 - cephfs_metadata (2 replicas)
Short: On fresh CentOS writing a bunch of files to CephFS mounted with ceph-fuse immediately causes object corruption in metadata pool.
Notes:
HDDs used were tested with SMART and badblocks and are OK.
On the same setup RBD images' corruption has been observed, so this bug is probably not CephFS specific!
Steps to reproduce:
0. Bootstrap fresh Ceph cluster
Running cephfs-journal-tool journal export backup.bin successful, deep scrub on all OSDs passes
1. Mount CephFS on the same node: ceph-fuse /mnt
2. Copy a bunch of files, like: cp -r /usr /mnt
3. Corrupted!
Observed results:
- Running cephfs-journal-tool journal export backup.bin results in output:
7f6e3425fbc0 -1 Missing object 200.00000012 7f6e3425fbc0 -1 Missing object 200.00000013 7f6e3425fbc0 -1 Missing object 200.00000014 7f6e3425fbc0 -1 Missing object 200.00000015 7f6e3425fbc0 -1 Missing object 200.00000016 7f6e3425fbc0 -1 Missing object 200.00000017 7f6e3425fbc0 -1 Missing object 200.00000018 7f6e3425fbc0 -1 Missing object 200.00000019 7f6e3425fbc0 -1 Missing object 200.0000001a 7f6e3425fbc0 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados` Error ((5) Input/output error)
at the same time in log:
7f86f85bd700 [ERR] : 2.38 full-object read crc 0xa0b54044 != expected 0xc894d150 on 2:1cc859d9:::200.00000012:head 7fc6512bc700 [ERR] : 2.e full-object read crc 0x8a402939 != expected 0x82852c6a on 2:715f5c78:::200.00000013:head 7f86f9dc0700 [ERR] : 2.1c full-object read crc 0x7332d275 != expected 0xe10809ca on 2:3939b497:::200.00000014:head 7f31bb129700 [ERR] : 2.3d full-object read crc 0x28e97282 != expected 0x28db137a on 2:bec9163d:::200.00000015:head 7f86f8dbe700 [ERR] : 2.f full-object read crc 0x2143a94a != expected 0x67b17290 on 2:f376ae95:::200.00000016:head 7f31bd12d700 [ERR] : 2.2a full-object read crc 0xde994f2a != expected 0x377451c7 on 2:545c15c7:::200.00000017:head 7fc652abf700 [ERR] : 2.1a full-object read crc 0x91db0268 != expected 0x3a6ef3fc on 2:590b566d:::200.00000018:head 7f86f85bd700 [ERR] : 2.b full-object read crc 0x82f2070d != expected 0x45eb7732 on 2:d2a48755:::200.00000019:head 7f86fa5c1700 [ERR] : 2.34 full-object read crc 0x7f9a6ab0 != expected 0x5b93080c on 2:2e600548:::200.0000001a:head
- Running deep scrub results in such messages in log:
7f86fadc2700 [ERR] : 2.38 shard 0: soid 2:1cc859d9:::200.00000012:head data_digest 0xa0b54044 != best guess data_digest 0x4b3761cf from auth shard 1 7f86fadc2700 [ERR] : 2.38 deep-scrub 0 missing, 1 inconsistent objects 7f86fadc2700 [ERR] : 2.38 deep-scrub 1 errors ......
- Deep scrub showing 10 PGs inconsistent, all from metadata pool
- Only running ceph pg repair on each group for 3 times repaired the journal and it is again possible to export it. Repairing each time resulted in a different log message:
2.34 repair 1 errors, 1 fixed 2.34 repair 1 errors, 0 fixed 2.34 repair ok, 0 fixed
- MDS does not start again (even though deep scrub is ok) writing to the log each time:
7f56643b7700 1 mds.0.7 recovery set is 7f565f1ab700 0 mds.0.cache creating system inode with ino:100 7f565f1ab700 0 mds.0.cache creating system inode with ino:1 7f565dca4700 -1 mds.0.journaler(ro) try_read_entry: decode error from _is_readable 7f565dca4700 0 mds.0.log _replay journaler got error -22, aborting 7f565dca4700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument
Files
Updated by Sage Weil about 7 years ago
- Status changed from New to Can't reproduce
I haven't seen this on kraken or later bluestore; pelase reopen if you do!