Osd - opportunistic whole-object checksums¶

Summary¶

Add a whole-object checksum (crc32c) to object_info_t. Update it when we scrub and object. Invalidate it when a partial write renders it obsolete. Use it for scrub when an inconsistency is found to determine which object(s) are damaged and which are correct.

Owners¶

Sage Weil (Red Hat)

Interested Parties¶

Name (Affiliation)

Current Status¶

For various reasons RADOS does not track checksums for object data that is stored at rest:

checksums need to be fine-grained in order to accomodate small overwrites
small IOs may not be (checksum) block aligned
there is presumably some performance penalty associated with updating checksum xattrs
btrfs does this for you; historically we haven't wanted to duplicate functionality

(Note that for erasure coded objects, we do store checksums because we have restricted the set of allowed operations.)
However, periodically we scrub and do generate a whole-object checksum. We send it over the wire to compare with other replicas, but we do not store it on disk. There is one crc for byte data and one for omap data.

Detailed Description¶

Add two sets of fields to object_info_t:

uint32_t data_crc32c; bool data_crc32c_valid;
uint32_t omap_crc32c; bool omap_crc32c_valid;

On scrub, compare the newly calculated checksum to the stored one (if present) and complain and any new inconsistency.
If there isn't a stored crc, update the object_info_t to store it.
Note that this will generate write IO during scrub for any recent objects. Viewed in the aggregate, this is one additional IO per object before it becomes cold. If the object is warm, the one additional IO isn't as significant. If it is becoming cold, this is the last one. If it is already cold, the crc is already stored and there is no additional load.
We can mitigate some of this cost by only storing the crc if the object age (as measured by now - mtime) is greater than some threshold. This should be a tunable.

Work items¶

Coding tasks¶

add object_info_t fields
set object_info_t fields during scrub when they are not present (by generated a new repop)
?limit updates if object is too new (based on mtime)
update scrub to compare current crc to stored crc and complain accordingly
update repair logic to prefer replicas that match their stored crc

Files (0)

Updated by Jessica Mack almost 9 years ago · 1 revisions

Project

General

Profile

Ceph

Sidebar¶

Wiki