Project

General

Profile

Actions

Bug #23428

open

Snapset inconsistency is hard to diagnose because authoritative copy used by list-inconsistent-snapset not shown

Added by David Zafman about 6 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
David Zafman
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


$ sudo rados list-inconsistent-snapset 3.7f
{"epoch":79,"inconsistents":[]}

$ sudo rados list-inconsistent-obj 3.7f --format=json-pretty
{
    "epoch": 79,
    "inconsistents": [
        {
            "object": {
                "name": "obj1",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 13
            },
            "errors": [
                "snapset_inconsistency" 
            ],
            "union_shard_errors": [],
            "selected_object_info": "3:ff7b1f36:::obj1:head(73'13 client.4471.0:1 dirty|data_digest|omap_digest s 1682 uv 13 dd 735b0743 od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 1,
                    "primary": false,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "0=[]:[]+stray_clone_snaps={1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                },
                {
                    "osd": 6,
                    "primary": true,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "6=[6,5,4,3,2,1]:{1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                },
                {
                    "osd": 8,
                    "primary": false,
                    "errors": [],
                    "size": 1682,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x735b0743",
                    "snapset": "6=[6,5,4,3,2,1]:{1=[1],2=[2],3=[3],4=[4],5=[5],6=[6]}" 
                }
            ]
        }
    ]
}

For now the user would have to increase the debug_osd log level and examine the osd logs to find the selected authoritative copy for a specific object. With 2 or different snapsets we could make it more complex by showing the snapshot results using each snapset for comparison or easier would be to indicate which is the authoritative copy. The existing code in PG::scrub_compare_maps() doesn't pass enough information to PrimaryLogPG::scrub_snapshot_metadata() for it to see both snapset variants or know which shard it is using.


Related issues 1 (0 open1 closed)

Related to RADOS - Feature #23364: Special scrub handling of hinfo_key errorsResolvedDavid Zafman03/14/2018

Actions
Actions #1

Updated by David Zafman about 6 years ago

  • Subject changed from Snapset inconsistency is hard to diagnose because authoritative copy used by list-inconsistent-snapset to Snapset inconsistency is hard to diagnose because authoritative copy used by list-inconsistent-snapset not shown
Actions #2

Updated by David Zafman about 6 years ago

  • Related to Feature #23364: Special scrub handling of hinfo_key errors added
Actions #3

Updated by David Zafman about 6 years ago

In the pull request https://github.com/ceph/ceph/pull/20947 there is a change to partially address this issue. Unfortunately, in the scenario shown in this tracker's description, we don't have any particular shards in error. So in this case the list-inconsistent-snapset will still have inconsistents empty.

Here is an example of what is improved:

    {
      "name": "obj14",
      "nspace": "",
      "locator": "",
      "snap": "head",
      "snapset": {
        "snap_context": {
          "seq": 1,
          "snaps": [
            1
          ]
        },
        "clones": [
          {
            "snap": 1,
            "size": 1033,
            "overlap": "[]",
            "snaps": [
              1
            ]
          }
        ]
      },
      "errors": []
    }, 
    {
      "errors": [
        "size_mismatch" 
      ],
      "snap": 1,
      "locator": "",
      "nspace": "",
      "name": "obj14" 
    }

Actions #4

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF