Bug #12000
closed
OSD: EC reads do not correctly validate checksums and data contents
Added by Greg Farnum almost 9 years ago.
Updated over 8 years ago.
Description
The checksum verification currently only happens in the syncronous read case in CEPH_OSD_OP_READ in do_osd_ops. In the EC case, ReplicatedPG::OpContext::finish_read needs to verify the checksum and propagate.
This task also includes building a teuthology test to verify the behavior so it doesn't break again.
- Priority changed from Normal to Urgent
From the mailing list ("Pawe? Sadowski" <ceph@sadziu.pl>):
I'm testing erasure coded pools. Is there any protection from bit-rot
errors on object read? If I modify one bit in object part (directly on
OSD) I'm getting *broken*object:
mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
bb2d82bbb95be6b9a039d135cc7a5d0d -
- modify one bit directly on OSD
mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
02f04f590010b4b0e6af4741c4097b4f -
- restore bit to original value
mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
bb2d82bbb95be6b9a039d135cc7a5d0d -
If I run deep-scrub on modified bit I'm getting inconsistent PG which is
correct in this case. After restoring bit and running deep-scrub again
all PGs are clean.
[ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
- Description updated (diff)
- Description updated (diff)
Whoever fixes this:
In CEPH_OSD_OP_READ, pass a boost::optional<uint32_t> checksum if present as well as &osd_op.rval and &osd_op.outdata to FillInExtent (rename to FillInVerifyExtent). By the time FillInVerifyExtent is called, the buffer should have been populated (see ECBackend) and we can checksum and compare with the passed checksum if present. &osd_op.rval can then be set to EIO if it doesn't match, 0 otherwise. ReplicatedPG::complete_read_ctx then needs to return EIO if any of the sub-reads returned EIO.
- Status changed from New to In Progress
- Assignee set to David Zafman
- Status changed from In Progress to 7
- Status changed from 7 to Resolved
21e9f69dd258a8c204828070cfe8b4018acdb145
- Related to Bug #14009: FAILED assert(old_size == total_chunk_size) in 0.94.5 / strange file size added
Also available in: Atom
PDF