Bug #20243
closedImprove size scrub error handling and ignore system attrs in xattr checking
0%
Description
Something similar to this was seen on a production system. If all the object_info_t matched there would be no errors from list-inconsistent-obj.
shard disk size oi size 0 1588 1588 1 1588 1588 2 1588 0 { "epoch": 17, "inconsistents": [ { "object": { "name": "foo", "nspace": "", "locator": "", "snap": "head", "version": 1 }, "errors": [ "object_info_inconsistency", "attr_value_mismatch" ], "union_shard_errors": [], "selected_object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])", "shards": [ { "osd": 0, "errors": [], "size": 1588, "omap_digest": "0xffffffff", "data_digest": "0xa9a36536", "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" }, { "osd": 1, "errors": [], "size": 1588, "omap_digest": "0xffffffff", "data_digest": "0xa9a36536", "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" }, { "osd": 2, "errors": [], "size": 1588, "omap_digest": "0xffffffff", "data_digest": "0xa9a36536", "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 0 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" } ] } ] }
Currently all we see is object_info_inconsistency and attr_value_mismatch and no shard errors. Without snapshots there is no info from list-inconsistent-snapset which included some additional size checking.
In be_select_auth_object we should check for a shards disk size vs oi_size. This should be a new disk_size_shard error. This would make that shard less likely to be the authoritative one.
We should ignore system xattrs when checking for attr_value_mismatch. We will ignore strange xattr keys and never report a attr_name_mismatch.
Already present in the code:
We have object error size_mismatch when different shard don't have the same disk size (maybe rename to disk_size_mismatch too?)
We have shard error size_mismatch_oi which like other _oi errors means the disk size doesn't match the authoritative size