Project

General

Profile

Bug #24994

active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub

Added by Robert Sander over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

a deep scrub revealed 59 active+clean+inconsistent PGs at one customer's cluster and 50 active+clean+inconsistent PGs at another customer's cluster.

This was after upgrading to 12.2.7.

The PGs belong to pools that hold qemu+rbd images.

"ceph pd repair" does not work as data_digests seem not to match:

2018-07-19 09:47:49.945406 osd.4 [ERR] 3.1 shard 2: soid 3:804f3088:::rb.0.7dccf5.238e1f29.000000003254:head data_digest 0x39c70649 != data_digest 0xe1be12c4 from auth oi 3:804f3088:::rb.0.7dccf5.238e1f29.000000003254:head(21955'1832257 client.152883867.0:435732 dirty|data_digest|omap_digest s 4194304 uv 1832257 dd e1be12c4 od ffffffff alloc_hint [4194304 4194304 0])

We started to "out" one OSD and are waiting for the backfilling now (as suggested in https://access.redhat.com/solutions/1460213), but this is a very time consuming procedure.

What can we do?

History

#1 Updated by Robert Sander over 5 years ago

I have now added "osd skip data digest = true" as per release notes and restarted all OSDs.

I still have inconsistent PGs, but 25 less:

root@ceph01:~# ceph health detail
HEALTH_ERR noout flag(s) set; 180 scrub errors; Possible data damage: 34 pgs inconsistent
OSDMAP_FLAGS noout flag(s) set
OSD_SCRUB_ERRORS 180 scrub errors
PG_DAMAGED Possible data damage: 34 pgs inconsistent
pg 2.53 is active+clean+inconsistent, acting [1,5,8]
pg 2.182 is active+clean+inconsistent, acting [9,3,6]
pg 2.18a is active+clean+inconsistent, acting [0,4,16]
pg 2.1dd is active+clean+inconsistent, acting [12,11,7]
pg 3.c is active+clean+inconsistent, acting [15,3,2]
pg 3.13 is active+clean+inconsistent, acting [1,17,14]
pg 3.20 is active+clean+inconsistent, acting [4,15,0]
pg 3.2e is active+clean+inconsistent, acting [14,2,7]
pg 3.3d is active+clean+inconsistent, acting [10,7,12]
pg 3.50 is active+clean+inconsistent, acting [7,5,0]
pg 3.5f is active+clean+inconsistent, acting [4,2,16]
pg 3.85 is active+clean+inconsistent, acting [4,16,1]
pg 3.89 is active+clean+inconsistent, acting [9,7,3]
pg 3.8d is active+clean+inconsistent, acting [10,12,8]
pg 3.90 is active+clean+inconsistent, acting [8,10,12]
pg 3.a8 is active+clean+inconsistent, acting [7,11,14]
pg 3.aa is active+clean+inconsistent, acting [1,3,7]
pg 3.b4 is active+clean+inconsistent, acting [9,17,13]
pg 3.ce is active+clean+inconsistent, acting [3,15,1]
pg 3.102 is active+clean+inconsistent, acting [13,2,6]
pg 3.120 is active+clean+inconsistent, acting [0,13,16]
pg 3.121 is active+clean+inconsistent, acting [1,16,4]
pg 3.12c is active+clean+inconsistent, acting [11,4,8]
pg 3.149 is active+clean+inconsistent, acting [12,0,8]
pg 3.16a is active+clean+inconsistent, acting [3,11,16]
pg 3.16e is active+clean+inconsistent, acting [14,0,8]
pg 3.170 is active+clean+inconsistent, acting [10,14,17]
pg 3.176 is active+clean+inconsistent, acting [9,7,12]
pg 3.18d is active+clean+inconsistent, acting [4,11,15]
pg 3.1a8 is active+clean+inconsistent, acting [7,2,14]
pg 3.1a9 is active+clean+inconsistent, acting [2,12,16]
pg 3.1c2 is active+clean+inconsistent, acting [7,2,5]
pg 3.1d7 is active+clean+inconsistent, acting [13,10,16]
pg 3.1df is active+clean+inconsistent, acting [12,15,1]

#2 Updated by Brad Hubbard over 5 years ago

  • Assignee set to Brad Hubbard

Can you post the output of 'rados list-inconsistent-obj 2.53 --format=json-pretty' ?

#3 Updated by Robert Sander over 5 years ago

root@ceph01:~# rados list-inconsistent-obj 2.53 --format=json-pretty
No scrub information available for pg 2.53
error 2: (2) No such file or directory

but

root@ceph01:~# rados list-inconsistent-obj 2.34 --format=json-pretty
{
    "epoch": 23687,
    "inconsistents": [
        {
            "object": {
                "name": "rbd_data.4048d8238e1f29.00000000000002e6",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 14379192
            },
            "errors": [],
            "union_shard_errors": [
                "data_digest_mismatch_info" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "rbd_data.4048d8238e1f29.00000000000002e6",
                    "key": "",
                    "snapid": -2,
                    "hash": 2610901044,
                    "max": 0,
                    "pool": 2,
                    "namespace": "" 
                },
                "version": "21894'14379192",
                "prior_version": "21894'14379189",
                "last_reqid": "client.152740563.0:21585",
                "user_version": 14379192,
                "size": 4194304,
                "mtime": "2018-07-16 02:01:20.077747",
                "local_mtime": "2018-07-16 02:01:20.091105",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest",
                    "omap_digest" 
                ],
                "legacy_snaps": [],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x3e0e156f",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0,
                    "redirect_target": {
                        "oid": "",
                        "key": "",
                        "snapid": 0,
                        "hash": 0,
                        "max": 0,
                        "pool": -9223372036854775808,
                        "namespace": "" 
                    }
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 2,
                    "primary": false,
                    "errors": [
                        "data_digest_mismatch_info" 
                    ],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x4a292cfd" 
                },
                {
                    "osd": 4,
                    "primary": false,
                    "errors": [
                        "data_digest_mismatch_info" 
                    ],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x4a292cfd" 
                },
                {
                    "osd": 16,
                    "primary": true,
                    "errors": [
                        "data_digest_mismatch_info" 
                    ],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x4a292cfd" 
                }
            ]
        }
    ]
}

with

root@ceph01:~# ceph health detail
HEALTH_ERR 276 scrub errors; Possible data damage: 55 pgs inconsistent
OSD_SCRUB_ERRORS 276 scrub errors
PG_DAMAGED Possible data damage: 55 pgs inconsistent
    pg 2.34 is active+clean+inconsistent, acting [16,4,2]
    pg 2.44 is active+clean+inconsistent, acting [14,8,2]
    pg 2.53 is active+clean+inconsistent, acting [1,5,8]
    pg 2.87 is active+clean+inconsistent, acting [13,7,2]
    pg 2.ec is active+clean+inconsistent, acting [7,2,14]
    pg 2.182 is active+clean+inconsistent, acting [9,3,6]
    pg 2.18a is active+clean+inconsistent, acting [0,4,16]
    pg 2.1c4 is active+clean+inconsistent, acting [13,15,9]
    pg 3.c is active+clean+inconsistent, acting [15,3,2]
    pg 3.13 is active+clean+inconsistent, acting [1,17,14]
    pg 3.1a is active+clean+inconsistent, acting [8,5,9]
    pg 3.1c is active+clean+inconsistent, acting [14,0,15]
    pg 3.20 is active+clean+inconsistent, acting [4,15,0]
    pg 3.2e is active+clean+inconsistent, acting [14,2,7]
    pg 3.32 is active+clean+inconsistent, acting [3,0,6]
    pg 3.3d is active+clean+inconsistent, acting [10,7,12]
    pg 3.50 is active+clean+inconsistent, acting [7,5,0]
    pg 3.5a is active+clean+inconsistent, acting [14,10,17]
    pg 3.5f is active+clean+inconsistent, acting [4,2,16]
    pg 3.7e is active+clean+inconsistent, acting [9,16,5]
    pg 3.85 is active+clean+inconsistent, acting [4,16,1]
    pg 3.89 is active+clean+inconsistent, acting [9,7,3]
    pg 3.8d is active+clean+inconsistent, acting [10,12,8]
    pg 3.90 is active+clean+inconsistent, acting [8,10,12]
    pg 3.a8 is active+clean+inconsistent, acting [7,11,14]
    pg 3.aa is active+clean+inconsistent, acting [1,3,7]
    pg 3.b4 is active+clean+inconsistent, acting [9,17,13]
    pg 3.ce is active+clean+inconsistent, acting [3,15,1]
    pg 3.d8 is active+clean+inconsistent, acting [12,9,8]
    pg 3.dc is active+clean+inconsistent, acting [0,5,15]
    pg 3.102 is active+clean+inconsistent, acting [13,2,6]
    pg 3.11e is active+clean+inconsistent, acting [11,13,16]
    pg 3.120 is active+clean+inconsistent, acting [0,13,16]
    pg 3.121 is active+clean+inconsistent, acting [1,16,4]
    pg 3.12c is active+clean+inconsistent, acting [11,4,8]
    pg 3.149 is active+clean+inconsistent, acting [12,0,8]
    pg 3.16a is active+clean+inconsistent, acting [3,11,16]
    pg 3.16e is active+clean+inconsistent, acting [14,0,8]
    pg 3.170 is active+clean+inconsistent, acting [10,14,17]
    pg 3.176 is active+clean+inconsistent, acting [9,7,12]
    pg 3.18d is active+clean+inconsistent, acting [4,11,15]
    pg 3.1a7 is active+clean+inconsistent, acting [11,8,4]
    pg 3.1a8 is active+clean+inconsistent, acting [7,2,14]
    pg 3.1a9 is active+clean+inconsistent, acting [2,12,16]
    pg 3.1ac is active+clean+inconsistent, acting [2,17,13]
    pg 3.1c2 is active+clean+inconsistent, acting [7,2,5]
    pg 3.1c9 is active+clean+inconsistent, acting [16,10,4]
    pg 3.1d5 is active+clean+inconsistent, acting [0,4,8]
    pg 3.1d6 is active+clean+inconsistent, acting [3,0,17]
    pg 3.1d7 is active+clean+inconsistent, acting [13,10,16]
    pg 3.1dd is active+clean+inconsistent, acting [14,2,8]

i.e. 21 new inconsistent PGs after this nights deep scrub runs.

#4 Updated by Anton Neubauer over 5 years ago

I have the same issue

#5 Updated by Brad Hubbard over 5 years ago

In the case of pg 2.34 above where the only error is "data_digest_mismatch_info" and all the data digests except the one in the selected_object_info are the same you should be able to resolve it with the following procedure.

1. rados -p [name_of_pool_2] setomapval rbd_data.4048d8238e1f29.00000000000002e6 temporary-key anything
2. ceph pg deep-scrub 2.34
3. Wait for the scrub to finish
4. rados -p [name_of_pool_2] rmomapkey rbd_data.4048d8238e1f29.00000000000002e6 temporary-key

This should work on any pg that satisfies the criteria above. If you have pgs with different errors, such as "data_digest_mismatch" (not "data_digest_mismatch_info") post the list-inconsistent-obj output here. If you are getting the "No such file or directory" error try completing a scrub specifically on that pg before issuing the command.

#6 Updated by Robert Sander over 5 years ago

Brad Hubbard wrote:

1. rados -p [name_of_pool_2] setomapval rbd_data.4048d8238e1f29.00000000000002e6 temporary-key anything
2. ceph pg deep-scrub 2.34
3. Wait for the scrub to finish
4. rados -p [name_of_pool_2] rmomapkey rbd_data.4048d8238e1f29.00000000000002e6 temporary-key

I have applied this procedure on a test cluster with the same issue without any luck:

root@ceph05:/var/log/ceph# ceph health detail
HEALTH_ERR 1 filesystem is degraded; 2 mds daemons damaged; noout flag(s) set; 6 scrub errors; Possible data damage: 2 pgs inconsistent
FS_DEGRADED 1 filesystem is degraded
    fs cephfs is degraded
MDS_DAMAGE 2 mds daemons damaged
    fs cephfs mds.0 is damaged
    fs cephfs mds.1 is damaged
OSDMAP_FLAGS noout flag(s) set
OSD_SCRUB_ERRORS 6 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
    pg 2.14 is active+clean+inconsistent, acting [2,4,0]
    pg 2.17 is active+clean+inconsistent, acting [6,5,2]
root@ceph05:/var/log/ceph# rados list-inconsistent-obj 2.14 --format=json-pretty
{
    "epoch": 702,
    "inconsistents": [
        {
            "object": {
                "name": "200.00000000",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 83
            },
            "errors": [],
            "union_shard_errors": [
                "data_digest_mismatch_info" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "200.00000000",
                    "key": "",
                    "snapid": -2,
                    "hash": 2219783316,
                    "max": 0,
                    "pool": 2,
                    "namespace": "" 
                },
                "version": "704'83",
                "prior_version": "704'82",
                "last_reqid": "client.11074684.0:1",
                "user_version": 83,
                "size": 90,
                "mtime": "2018-07-23 10:06:13.458068",
                "local_mtime": "2018-07-23 10:06:13.461844",
                "lost": 0,
                "flags": [
                    "dirty",
                    "omap",
                    "data_digest" 
                ],
                "legacy_snaps": [],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x2e078a4f",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0,
                    "redirect_target": {
                        "oid": "",
                        "key": "",
                        "snapid": 0,
                        "hash": 0,
                        "max": 0,
                        "pool": -9223372036854775808,
                        "namespace": "" 
                    }
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": false,
                    "errors": [
                        "data_digest_mismatch_info" 
                    ],
                    "size": 90,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x073cc8d6" 
                },
                {
                    "osd": 2,
                    "primary": true,
                    "errors": [
                        "data_digest_mismatch_info" 
                    ],
                    "size": 90,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x073cc8d6" 
                },
                {
                    "osd": 4,
                    "primary": false,
                    "errors": [
                        "data_digest_mismatch_info" 
                    ],
                    "size": 90,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x073cc8d6" 
                }
            ]
        }
    ]
}
root@ceph05:/var/log/ceph# rados -p cephfs_metadata setomapval "200.00000000" temporary-key abcdef
root@ceph05:/var/log/ceph# ceph pg deep-scrub 2.14
instructing pg 2.14 on osd.2 to deep-scrub

The logfile then contains:

2018-07-23 10:05:36.169693 osd.2 [ERR] 2.14 shard 0: soid 2:292cf221:::200.00000000:head data_digest 0x73cc8d6 != data_digest 0x2e078a4f from auth oi 2:292cf221:::200.00000000:head(704'82 client.11064723.0:1 dirty|omap|data_digest s 90 uv 82 dd 2e078a4f alloc_hint [0 0 0])
2018-07-23 10:05:36.169696 osd.2 [ERR] 2.14 shard 2: soid 2:292cf221:::200.00000000:head data_digest 0x73cc8d6 != data_digest 0x2e078a4f from auth oi 2:292cf221:::200.00000000:head(704'82 client.11064723.0:1 dirty|omap|data_digest s 90 uv 82 dd 2e078a4f alloc_hint [0 0 0])
2018-07-23 10:05:36.169704 osd.2 [ERR] 2.14 shard 4: soid 2:292cf221:::200.00000000:head data_digest 0x73cc8d6 != data_digest 0x2e078a4f from auth oi 2:292cf221:::200.00000000:head(704'82 client.11064723.0:1 dirty|omap|data_digest s 90 uv 82 dd 2e078a4f alloc_hint [0 0 0])
2018-07-23 10:05:36.169706 osd.2 [ERR] 2.14 soid 2:292cf221:::200.00000000:head: failed to pick suitable auth object
2018-07-23 10:05:36.169842 osd.2 [ERR] 2.14 deep-scrub 3 errors

The test cluster has "osd distrust data digest = true" as it has a mixture of BlueStore and FileStore OSDs.

#7 Updated by Brad Hubbard over 5 years ago

Oops, my mistake, terribly sorry. I gave you the procedure for an omap_digest_mismatch_info error.

For the data_digest_mismatch_info error with client activity stopped, read the data from this object and write it again using rados get then rados put. Sorry about my mixing these two up.

#8 Updated by Robert Sander over 5 years ago

Brad Hubbard wrote:

For the data_digest_mismatch_info error with client activity stopped, read the data from this object and write it again using rados get then rados put. Sorry about my mixing these two up.

This worked. I stopped all MDS services to stop client IO on this pool.

As this was the cephfs_metadata pool I had to tell the MDS's that they have been repaired after that with "cph mds repaired <mdsid>".

On the production cluster the RBD pool is affected. Do I really need to stop the VMs and do the "get/put" repair or will the issue resolve itself when the VM does IO on the affected objects?

#9 Updated by Brad Hubbard over 5 years ago

Robert Sander wrote:

On the production cluster the RBD pool is affected. Do I really need to stop the VMs and do the "get/put" repair or will the issue resolve itself when the VM does IO on the affected objects?

You would only need to stop the VM using the image that the object belongs to and yes, as is documented, and as you have already been advised "These warnings are harmless in the sense that IO is not affected and the replicas are all still in sync. The number
of affected objects is likely to drop (possibly to zero) on their own over time as those objects are modified" I was under the impression you opened this tracker to get a more immediate solution.

Also available in: Atom PDF