Project

General

Profile

Actions

Bug #63876

open

rados/cephfs: apparent file corruption on cephfs on an EC pool

Added by Samuel Just 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Adapted from https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NZE7HNDXSHAWISDHHZQXHKH2ZG4LQ4X7/

I'm seeing different results from reading files, depending on which OSDs
are running, including some incorrect reads with all OSDs running, in
CephFS from a pool with erasure coding. I'm running Ceph 17.2.6.

  1. More detail

In particular, I have a relatively large backup of some files, combined
with SHA-256 hashes of the files (which were verified when the backup
was created, approximately 7 months ago). Verifying these hashes
currently gives several errors, both in large and small files, but
somewhat tilted towards larger files.

Investigating which PGs stored the relevant files (using
cephfs-data-scan pg_files) didn't show the problem to be isolated to one
PG, but did show several PGs that contained OSD 15 as an active member.

Taking OSD 15 offline leads to better reads (more files with correct
SHA-256 hashes), but not completely correct reads. Further investigation
implicated OSD 34 as another potential issue, but taking it offline also
results in more correct files but not completely.

Bringing the stopped OSDs (15 and 34) back online results in the earlier
(incorrect) hashes when reading files, as might be expected, but this
seems to demonstrate that the correct information (or at least more
correct information) is still on the drives.

The hashes I receive for a given corrupted file are consistent from read
to read (including on different hosts, to avoid caching as an issue),
but obviously sometimes change if I take an affected OSD offline.

  1. Recent history

I have Ceph configured with a deep scrub interval of approximately 30
days, and they have completed regularly with no issues identified.
However, within the past two weeks I added two additional drives to the
cluster, and rebalancing took about two weeks to complete: the placement
groups I took notice of having issues were not deep scrubbed since the
replacement completed, so it is possible something got corrupted during
the rebalance.

Neither OSD 15 nor 34 is a new drive, and as far as I have experienced
(and Ceph's health indications have shown), all of the existing OSDs
have behaved correctly up to this point.

  1. Configuration

I created an erasure coding profile for the pool in question using the
following command:

ceph osd erasure-code-profile set erasure_k4_m2 \
plugin=jerasure \
k=4 m=2 \
technique=blaum_roth \
crush-device-class=hdd

And the following CRUSH rule is used for the pool:

rule erasure_k4_m2_hdd_rule {
id 3
type erasure
min_size 4
max_size 6
step take default class hdd
step choose indep 3 type host
step chooseleaf indep 2 type osd
step emit
}
  1. Questions

1. Does this behavior ring a bell to anyone? Is there something obvious
I'm missing or should do?

2. Is deep scrubbing likely to help the situation? Hurt it? (Hopefully
not hurt: I've prioritized deep scrubbing of the PGs on OSD 15 and 34,
and will likely follow up with the rest of the pool.)

3. Is there a way to force "full reads" or otherwise to use all of the
EC chunks (potentially in tandem with on-disk checksums) to identify the
correct data, rather than a combination of the data from the primary OSDs?


Files

bad_read_logs.zip (14.8 KB) bad_read_logs.zip Logs from an attempted read of object 1000268bd02.00000000 - aschmitz, 01/03/2024 04:41 AM
notes.txt (2.99 KB) notes.txt Annotated shell commands run to gather data - aschmitz, 01/03/2024 04:44 AM
1000266ce21.00000000.osd15.gz (392 KB) 1000266ce21.00000000.osd15.gz Laura Flores, 01/04/2024 07:54 PM
1000266ce21.00000000.osd9.gz (336 KB) 1000266ce21.00000000.osd9.gz Laura Flores, 01/04/2024 07:54 PM
1000266ce21.00000000.osd16.gz (373 KB) 1000266ce21.00000000.osd16.gz Laura Flores, 01/04/2024 07:54 PM
1000266ce21.00000000.osd22.gz (392 KB) 1000266ce21.00000000.osd22.gz Laura Flores, 01/04/2024 07:54 PM
1000266ce21.00000000.osd24.gz (336 KB) 1000266ce21.00000000.osd24.gz Laura Flores, 01/04/2024 07:54 PM
1000266ce21.00000000.osd34.gz (392 KB) 1000266ce21.00000000.osd34.gz Laura Flores, 01/04/2024 07:54 PM

Related issues 1 (1 open0 closed)

Related to RADOS - Bug #64419: Make the blaum_roth technique experimental for ec profilesNewLaura Flores

Actions
Actions

Also available in: Atom PDF