Project

General

Profile

Bug #20243

Updated by David Zafman almost 7 years ago

 
 Something similar to this was seen on a production system.    If all the object_info_t matched there would be no errors from list-inconsistent-obj. 

 <pre> 
 shard    disk size       oi size 
 0            1588         1588            0 
 1            1588         1588 
 2            1588            0         1588 

 { 
     "epoch": 17, 
     "inconsistents": [ 
         { 
             "object": { 
                 "name": "foo", 
                 "nspace": "", 
                 "locator": "", 
                 "snap": "head", 
                 "version": 1 
             }, 
             "errors": [ 
                 "object_info_inconsistency", 
                 "attr_value_mismatch" 
             ], 
             "union_shard_errors": [], 
             "selected_object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])", 
             "shards": [ 
                 { 
                     "osd": 0, 
                     "errors": [], 
                     "size": 1588, 
                     "omap_digest": "0xffffffff", 
                     "data_digest": "0xa9a36536", 
                     "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" 
                 }, 
                 { 
                     "osd": 1, 
                     "errors": [], 
                     "size": 1588, 
                     "omap_digest": "0xffffffff", 
                     "data_digest": "0xa9a36536", 
                     "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 1588 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" 
                 }, 
                 { 
                     "osd": 2, 
                     "errors": [], 
                     "size": 1588, 
                     "omap_digest": "0xffffffff", 
                     "data_digest": "0xa9a36536", 
                     "object_info": "0:602f83fe:::foo:head(12'1 client.4111.0:1 dirty|data_digest|omap_digest s 0 uv 1 dd a9a36536 od ffffffff alloc_hint [0 0 0])" 
                 } 
             ] 
         } 
     ] 
 } 
 </pre> 

 Currently all we see is object_info_inconsistency and attr_value_mismatch and no shard errors.    Without snapshots there is no info from list-inconsistent-snapset which included some additional size checking. 

 In be_select_auth_object we should check for a shards disk size vs oi_size.    This should be a new disk_size_shard error.    This would make that shard less likely to be the authoritative one. 
 We should ignore system xattrs when checking for attr_value_mismatch.    We will ignore strange xattr keys and never report a attr_name_mismatch. 

 Already present in the code: 
 We have object error size_mismatch when different shard don't have the same disk size (maybe rename to disk_size_mismatch too?) 
 We have shard error size_mismatch_oi which like other _oi errors means the disk size doesn't match the authoritative size 


Back