Project

General

Profile

Bug #12000

OSD: EC reads do not correctly validate checksums and data contents

Added by Greg Farnum about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
Start date:
06/12/2015
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

The checksum verification currently only happens in the syncronous read case in CEPH_OSD_OP_READ in do_osd_ops. In the EC case, ReplicatedPG::OpContext::finish_read needs to verify the checksum and propagate.

This task also includes building a teuthology test to verify the behavior so it doesn't break again.


Related issues

Related to Ceph - Bug #14009: FAILED assert(old_size == total_chunk_size) in 0.94.5 / strange file size Can't reproduce 12/07/2015

Associated revisions

Revision 21e9f69d (diff)
Added by David Zafman almost 4 years ago

osd: Check CRC when able to on async read

Fixes: #12000

Signed-off-by: David Zafman <>

History

#1 Updated by Samuel Just about 4 years ago

  • Priority changed from Normal to Urgent

From the mailing list ("Pawe? Sadowski" <>):

I'm testing erasure coded pools. Is there any protection from bit-rot
errors on object read? If I modify one bit in object part (directly on
OSD) I'm getting *broken*object:

mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
bb2d82bbb95be6b9a039d135cc7a5d0d -
  1. modify one bit directly on OSD
mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
02f04f590010b4b0e6af4741c4097b4f -
  1. restore bit to original value
mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
bb2d82bbb95be6b9a039d135cc7a5d0d -

If I run deep-scrub on modified bit I'm getting inconsistent PG which is
correct in this case. After restoring bit and running deep-scrub again
all PGs are clean.

[ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]

#2 Updated by Samuel Just about 4 years ago

  • Description updated (diff)

#3 Updated by Samuel Just about 4 years ago

  • Description updated (diff)

#4 Updated by Samuel Just about 4 years ago

Whoever fixes this:

In CEPH_OSD_OP_READ, pass a boost::optional<uint32_t> checksum if present as well as &osd_op.rval and &osd_op.outdata to FillInExtent (rename to FillInVerifyExtent). By the time FillInVerifyExtent is called, the buffer should have been populated (see ECBackend) and we can checksum and compare with the passed checksum if present. &osd_op.rval can then be set to EIO if it doesn't match, 0 otherwise. ReplicatedPG::complete_read_ctx then needs to return EIO if any of the sub-reads returned EIO.

#5 Updated by David Zafman about 4 years ago

  • Status changed from New to In Progress

#6 Updated by David Zafman about 4 years ago

  • Assignee set to David Zafman

#7 Updated by David Zafman about 4 years ago

  • Status changed from In Progress to Testing

#8 Updated by David Zafman almost 4 years ago

  • Status changed from Testing to Resolved

21e9f69dd258a8c204828070cfe8b4018acdb145

#9 Updated by Loic Dachary over 3 years ago

  • Related to Bug #14009: FAILED assert(old_size == total_chunk_size) in 0.94.5 / strange file size added

Also available in: Atom PDF