Bug #12000: OSD: EC reads do not correctly validate checksums and data contents - Ceph - Ceph

Actions

Copy link

Bug #12000

closed

OSD: EC reads do not correctly validate checksums and data contents

Added by Greg Farnum almost 9 years ago. Updated over 8 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

David Zafman

Category:

OSD

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

The checksum verification currently only happens in the syncronous read case in CEPH_OSD_OP_READ in do_osd_ops. In the EC case, ReplicatedPG::OpContext::finish_read needs to verify the checksum and propagate.

This task also includes building a teuthology test to verify the behavior so it doesn't break again.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Samuel Just almost 9 years ago

Priority changed from Normal to Urgent

From the mailing list ("Pawe? Sadowski" <ceph@sadziu.pl>):

I'm testing erasure coded pools. Is there any protection from bit-rot
errors on object read? If I modify one bit in object part (directly on
OSD) I'm getting *broken*object:

mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
    bb2d82bbb95be6b9a039d135cc7a5d0d  -

modify one bit directly on OSD

mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
    02f04f590010b4b0e6af4741c4097b4f  -

restore bit to original value

mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
    bb2d82bbb95be6b9a039d135cc7a5d0d  -

If I run deep-scrub on modified bit I'm getting inconsistent PG which is
correct in this case. After restoring bit and running deep-scrub again
all PGs are clean.

[ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]

Actions

Copy link

Updated by Samuel Just almost 9 years ago

Description updated (diff)

Actions

Copy link

Updated by Samuel Just almost 9 years ago

Description updated (diff)

Actions

Copy link

Updated by Samuel Just almost 9 years ago

Whoever fixes this:

In CEPH_OSD_OP_READ, pass a boost::optional<uint32_t> checksum if present as well as &osd_op.rval and &osd_op.outdata to FillInExtent (rename to FillInVerifyExtent). By the time FillInVerifyExtent is called, the buffer should have been populated (see ECBackend) and we can checksum and compare with the passed checksum if present. &osd_op.rval can then be set to EIO if it doesn't match, 0 otherwise. ReplicatedPG::complete_read_ctx then needs to return EIO if any of the sub-reads returned EIO.

Actions

Copy link