Bug #20243
closedImprove size scrub error handling and ignore system attrs in xattr checking
0%
Description
Something similar to this was seen on a production system. If all the object_info_t matched there would be no errors from list-inconsistent-obj.
shard disk size oi size 0 1588 1588 1 1588 1588 2 1588 0 { "epoch": 17, "inconsistents": [ { "object": { "name": "foo", "nspace": "", "locator": "", "snap": "head", "version": 1 }, "errors": [ "object_info_inconsistency", "attr_value_mismatch" ], "union_shard_errors": [], "selected_object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])", "shards": [ { "osd": 0, "errors": [], "size": 1588, "omap_digest": "0xffffffff", "data_digest": "0xa9a36536", "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" }, { "osd": 1, "errors": [], "size": 1588, "omap_digest": "0xffffffff", "data_digest": "0xa9a36536", "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" }, { "osd": 2, "errors": [], "size": 1588, "omap_digest": "0xffffffff", "data_digest": "0xa9a36536", "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 0 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" } ] } ] }
Currently all we see is object_info_inconsistency and attr_value_mismatch and no shard errors. Without snapshots there is no info from list-inconsistent-snapset which included some additional size checking.
In be_select_auth_object we should check for a shards disk size vs oi_size. This should be a new disk_size_shard error. This would make that shard less likely to be the authoritative one.
We should ignore system xattrs when checking for attr_value_mismatch. We will ignore strange xattr keys and never report a attr_name_mismatch.
Already present in the code:
We have object error size_mismatch when different shard don't have the same disk size (maybe rename to disk_size_mismatch too?)
We have shard error size_mismatch_oi which like other _oi errors means the disk size doesn't match the authoritative size
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category set to Scrub/Repair
- Component(RADOS) OSD added
Updated by David Zafman almost 7 years ago
- Status changed from New to Fix Under Review
Updated by David Zafman over 6 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by David Zafman over 6 years ago
- Related to Feature #18836: list-inconsistent-obj should show which osd is the primary added
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #21051: luminous: Improve size scrub error handling and ignore system attrs in xattr checking added
Updated by David Zafman over 6 years ago
If we wanted to backport to Jewel it would be helpful to include this pull request first.
Updated by David Zafman over 6 years ago
- Status changed from Pending Backport to Resolved