Project

General

Profile

Actions

Feature #8343

closed

please enable data integrity checking (by default) / silent data corruption

Added by Dmitry Smirnov almost 10 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

I'm scrubbing CephFS with fsprobe which detected data corruption around the time when one OSD died due to Btrfs file system corruption (Linux 3.14.2 is still not very stable with Btrfs snapshots and it occasionally crashes).

2014-05-13 22:08:31 diff: cycle 7 bufcnt 15204 bufpos 00000038 56 offset 36400038 910164024 expected 0xAA got 0x00
2014-05-13 22:08:31 diff: cycle 7 bufcnt 15204 bufpos 00000039 57 offset 36400039 910164025 expected 0xAA got 0x00
2014-05-13 22:08:31 diff: cycle 7 bufcnt 15204 bufpos 0000003A 58 offset 3640003A 910164026 expected 0xAA got 0x00
2014-05-13 22:08:31 diff: cycle 7 bufcnt 15204 bufpos 0000003B 59 offset 3640003B 910164027 expected 0xAA got 0x00
2014-05-13 22:08:31 diff: cycle 7 bufcnt 15204 bufpos 0000003C 60 offset 3640003C 910164028 expected 0xAA got 0x00
2014-05-13 22:08:31 diff: cycle 7 bufcnt 15204 bufpos 0000003D 61 offset 3640003D 910164029 expected 0xAA got 0x00
2014-05-13 22:08:31 diff: cycle 7 bufcnt 15204 bufpos 0000003E 62 offset 3640003E 910164030 expected 0xAA got 0x00
2014-05-13 22:08:31 total 1048576 differing bytes found

At least 16 MiB of data was corrupted. On all instances zeroes was read instead of previously written data.

Unfortunately that was the worst type of corruption -- a silent one.
I examined ceph.log and found nothing unusual that could suggest that kind of damage.

This has happened on replicated pool (v0.80.1) with 4 replicas and min_size==2 so corruption could be avoided.

IMHO data integrity is of paramount importance for Ceph.
Please implement (or enable) data integrity checking, preferably by default.


Related issues 1 (1 open0 closed)

Related to RADOS - Feature #8609: Improve ceph pg repairNew

Actions
Actions

Also available in: Atom PDF