Project

General

Profile

Actions

Bug #12000

closed

OSD: EC reads do not correctly validate checksums and data contents

Added by Greg Farnum almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The checksum verification currently only happens in the syncronous read case in CEPH_OSD_OP_READ in do_osd_ops. In the EC case, ReplicatedPG::OpContext::finish_read needs to verify the checksum and propagate.

This task also includes building a teuthology test to verify the behavior so it doesn't break again.


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #14009: FAILED assert(old_size == total_chunk_size) in 0.94.5 / strange file sizeCan't reproduce12/07/2015

Actions
Actions #1

Updated by Samuel Just almost 9 years ago

  • Priority changed from Normal to Urgent

From the mailing list ("Pawe? Sadowski" <>):

I'm testing erasure coded pools. Is there any protection from bit-rot
errors on object read? If I modify one bit in object part (directly on
OSD) I'm getting *broken*object:

mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
bb2d82bbb95be6b9a039d135cc7a5d0d -
  1. modify one bit directly on OSD
mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
02f04f590010b4b0e6af4741c4097b4f -
  1. restore bit to original value
mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
bb2d82bbb95be6b9a039d135cc7a5d0d -

If I run deep-scrub on modified bit I'm getting inconsistent PG which is
correct in this case. After restoring bit and running deep-scrub again
all PGs are clean.

[ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]

Actions #2

Updated by Samuel Just almost 9 years ago

  • Description updated (diff)
Actions #3

Updated by Samuel Just almost 9 years ago

  • Description updated (diff)
Actions #4

Updated by Samuel Just almost 9 years ago

Whoever fixes this:

In CEPH_OSD_OP_READ, pass a boost::optional<uint32_t> checksum if present as well as &osd_op.rval and &osd_op.outdata to FillInExtent (rename to FillInVerifyExtent). By the time FillInVerifyExtent is called, the buffer should have been populated (see ECBackend) and we can checksum and compare with the passed checksum if present. &osd_op.rval can then be set to EIO if it doesn't match, 0 otherwise. ReplicatedPG::complete_read_ctx then needs to return EIO if any of the sub-reads returned EIO.

Actions #5

Updated by David Zafman almost 9 years ago

  • Status changed from New to In Progress
Actions #6

Updated by David Zafman almost 9 years ago

  • Assignee set to David Zafman
Actions #7

Updated by David Zafman almost 9 years ago

  • Status changed from In Progress to 7
Actions #8

Updated by David Zafman over 8 years ago

  • Status changed from 7 to Resolved

21e9f69dd258a8c204828070cfe8b4018acdb145

Actions #9

Updated by Loïc Dachary over 8 years ago

  • Related to Bug #14009: FAILED assert(old_size == total_chunk_size) in 0.94.5 / strange file size added
Actions

Also available in: Atom PDF