Project

General

Profile

Bug #23428

Snapset inconsistency is hard to diagnose because authoritative copy used by list-inconsistent-snapset not shown

Added by David Zafman 10 months ago. Updated 10 months ago.

Status:
Verified
Priority:
Normal
Assignee:
Category:
Scrub/Repair
Target version:
-
Start date:
03/20/2018
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description


$ sudo rados list-inconsistent-snapset 3.7f
{"epoch":79,"inconsistents":[]}

$ sudo rados list-inconsistent-obj 3.7f --format=json-pretty
{
    "epoch": 79,
    "inconsistents": [
        {
            "object": {
                "name": "obj1",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 13
            },
            "errors": [
                "snapset_inconsistency" 
            ],
            "union_shard_errors": [],
            "selected_object_info": "3:ff7b1f36:::obj1:head(73'13 client.4471.0:1 dirty|data_digest|omap_digest s 1682 uv 13 dd 735b0743 od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 1,
                    "primary": false,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "0=[]:[]+stray_clone_snaps={1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                },
                {
                    "osd": 6,
                    "primary": true,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "6=[6,5,4,3,2,1]:{1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                },
                {
                    "osd": 8,
                    "primary": false,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "6=[6,5,4,3,2,1]:{1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                }
            ]
        }
    ]
}

For now the user would have to increase the debug_osd log level and examine the osd logs to find the selected authoritative copy for a specific object. With 2 or different snapsets we could make it more complex by showing the snapshot results using each snapset for comparison or easier would be to indicate which is the authoritative copy. The existing code in PG::scrub_compare_maps() doesn't pass enough information to PrimaryLogPG::scrub_snapshot_metadata() for it to see both snapset variants or know which shard it is using.


Related issues

Related to RADOS - Feature #23364: Special scrub handling of hinfo_key errors Resolved 03/14/2018

History

#1 Updated by David Zafman 10 months ago

  • Subject changed from Snapset inconsistency is hard to diagnose because authoritative copy used by list-inconsistent-snapset to Snapset inconsistency is hard to diagnose because authoritative copy used by list-inconsistent-snapset not shown

#2 Updated by David Zafman 10 months ago

  • Related to Feature #23364: Special scrub handling of hinfo_key errors added

#3 Updated by David Zafman 10 months ago

In the pull request https://github.com/ceph/ceph/pull/20947 there is a change to partially address this issue. Unfortunately, in the scenario shown in this tracker's description, we don't have any particular shards in error. So in this case the list-inconsistent-snapset will still have inconsistents empty.

Here is an example of what is improved:

    {
      "name": "obj14",
      "nspace": "",
      "locator": "",
      "snap": "head",
      "snapset": {
        "snap_context": {
          "seq": 1,
          "snaps": [
            1
          ]
        },
        "clones": [
          {
            "snap": 1,
            "size": 1033,
            "overlap": "[]",
            "snaps": [
              1
            ]
          }
        ]
      },
      "errors": []
    }, 
    {
      "errors": [
        "size_mismatch" 
      ],
      "snap": 1,
      "locator": "",
      "nspace": "",
      "name": "obj14" 
    }

Also available in: Atom PDF