Bug #24994
active+clean+inconsistent PGs after Upgrade to 12.2.7 and deep scrub
0%
Description
Hi,
a deep scrub revealed 59 active+clean+inconsistent PGs at one customer's cluster and 50 active+clean+inconsistent PGs at another customer's cluster.
This was after upgrading to 12.2.7.
The PGs belong to pools that hold qemu+rbd images.
"ceph pd repair" does not work as data_digests seem not to match:
2018-07-19 09:47:49.945406 osd.4 [ERR] 3.1 shard 2: soid 3:804f3088:::rb.0.7dccf5.238e1f29.000000003254:head data_digest 0x39c70649 != data_digest 0xe1be12c4 from auth oi 3:804f3088:::rb.0.7dccf5.238e1f29.000000003254:head(21955'1832257 client.152883867.0:435732 dirty|data_digest|omap_digest s 4194304 uv 1832257 dd e1be12c4 od ffffffff alloc_hint [4194304 4194304 0])
We started to "out" one OSD and are waiting for the backfilling now (as suggested in https://access.redhat.com/solutions/1460213), but this is a very time consuming procedure.
What can we do?
History
#1 Updated by Robert Sander over 5 years ago
I have now added "osd skip data digest = true" as per release notes and restarted all OSDs.
I still have inconsistent PGs, but 25 less:
root@ceph01:~# ceph health detail
HEALTH_ERR noout flag(s) set; 180 scrub errors; Possible data damage: 34 pgs inconsistent
OSDMAP_FLAGS noout flag(s) set
OSD_SCRUB_ERRORS 180 scrub errors
PG_DAMAGED Possible data damage: 34 pgs inconsistent
pg 2.53 is active+clean+inconsistent, acting [1,5,8]
pg 2.182 is active+clean+inconsistent, acting [9,3,6]
pg 2.18a is active+clean+inconsistent, acting [0,4,16]
pg 2.1dd is active+clean+inconsistent, acting [12,11,7]
pg 3.c is active+clean+inconsistent, acting [15,3,2]
pg 3.13 is active+clean+inconsistent, acting [1,17,14]
pg 3.20 is active+clean+inconsistent, acting [4,15,0]
pg 3.2e is active+clean+inconsistent, acting [14,2,7]
pg 3.3d is active+clean+inconsistent, acting [10,7,12]
pg 3.50 is active+clean+inconsistent, acting [7,5,0]
pg 3.5f is active+clean+inconsistent, acting [4,2,16]
pg 3.85 is active+clean+inconsistent, acting [4,16,1]
pg 3.89 is active+clean+inconsistent, acting [9,7,3]
pg 3.8d is active+clean+inconsistent, acting [10,12,8]
pg 3.90 is active+clean+inconsistent, acting [8,10,12]
pg 3.a8 is active+clean+inconsistent, acting [7,11,14]
pg 3.aa is active+clean+inconsistent, acting [1,3,7]
pg 3.b4 is active+clean+inconsistent, acting [9,17,13]
pg 3.ce is active+clean+inconsistent, acting [3,15,1]
pg 3.102 is active+clean+inconsistent, acting [13,2,6]
pg 3.120 is active+clean+inconsistent, acting [0,13,16]
pg 3.121 is active+clean+inconsistent, acting [1,16,4]
pg 3.12c is active+clean+inconsistent, acting [11,4,8]
pg 3.149 is active+clean+inconsistent, acting [12,0,8]
pg 3.16a is active+clean+inconsistent, acting [3,11,16]
pg 3.16e is active+clean+inconsistent, acting [14,0,8]
pg 3.170 is active+clean+inconsistent, acting [10,14,17]
pg 3.176 is active+clean+inconsistent, acting [9,7,12]
pg 3.18d is active+clean+inconsistent, acting [4,11,15]
pg 3.1a8 is active+clean+inconsistent, acting [7,2,14]
pg 3.1a9 is active+clean+inconsistent, acting [2,12,16]
pg 3.1c2 is active+clean+inconsistent, acting [7,2,5]
pg 3.1d7 is active+clean+inconsistent, acting [13,10,16]
pg 3.1df is active+clean+inconsistent, acting [12,15,1]
#2 Updated by Brad Hubbard over 5 years ago
- Assignee set to Brad Hubbard
Can you post the output of 'rados list-inconsistent-obj 2.53 --format=json-pretty' ?
#3 Updated by Robert Sander over 5 years ago
root@ceph01:~# rados list-inconsistent-obj 2.53 --format=json-pretty No scrub information available for pg 2.53 error 2: (2) No such file or directory
but
root@ceph01:~# rados list-inconsistent-obj 2.34 --format=json-pretty { "epoch": 23687, "inconsistents": [ { "object": { "name": "rbd_data.4048d8238e1f29.00000000000002e6", "nspace": "", "locator": "", "snap": "head", "version": 14379192 }, "errors": [], "union_shard_errors": [ "data_digest_mismatch_info" ], "selected_object_info": { "oid": { "oid": "rbd_data.4048d8238e1f29.00000000000002e6", "key": "", "snapid": -2, "hash": 2610901044, "max": 0, "pool": 2, "namespace": "" }, "version": "21894'14379192", "prior_version": "21894'14379189", "last_reqid": "client.152740563.0:21585", "user_version": 14379192, "size": 4194304, "mtime": "2018-07-16 02:01:20.077747", "local_mtime": "2018-07-16 02:01:20.091105", "lost": 0, "flags": [ "dirty", "data_digest", "omap_digest" ], "legacy_snaps": [], "truncate_seq": 0, "truncate_size": 0, "data_digest": "0x3e0e156f", "omap_digest": "0xffffffff", "expected_object_size": 0, "expected_write_size": 0, "alloc_hint_flags": 0, "manifest": { "type": 0, "redirect_target": { "oid": "", "key": "", "snapid": 0, "hash": 0, "max": 0, "pool": -9223372036854775808, "namespace": "" } }, "watchers": {} }, "shards": [ { "osd": 2, "primary": false, "errors": [ "data_digest_mismatch_info" ], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x4a292cfd" }, { "osd": 4, "primary": false, "errors": [ "data_digest_mismatch_info" ], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x4a292cfd" }, { "osd": 16, "primary": true, "errors": [ "data_digest_mismatch_info" ], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x4a292cfd" } ] } ] }
with
root@ceph01:~# ceph health detail HEALTH_ERR 276 scrub errors; Possible data damage: 55 pgs inconsistent OSD_SCRUB_ERRORS 276 scrub errors PG_DAMAGED Possible data damage: 55 pgs inconsistent pg 2.34 is active+clean+inconsistent, acting [16,4,2] pg 2.44 is active+clean+inconsistent, acting [14,8,2] pg 2.53 is active+clean+inconsistent, acting [1,5,8] pg 2.87 is active+clean+inconsistent, acting [13,7,2] pg 2.ec is active+clean+inconsistent, acting [7,2,14] pg 2.182 is active+clean+inconsistent, acting [9,3,6] pg 2.18a is active+clean+inconsistent, acting [0,4,16] pg 2.1c4 is active+clean+inconsistent, acting [13,15,9] pg 3.c is active+clean+inconsistent, acting [15,3,2] pg 3.13 is active+clean+inconsistent, acting [1,17,14] pg 3.1a is active+clean+inconsistent, acting [8,5,9] pg 3.1c is active+clean+inconsistent, acting [14,0,15] pg 3.20 is active+clean+inconsistent, acting [4,15,0] pg 3.2e is active+clean+inconsistent, acting [14,2,7] pg 3.32 is active+clean+inconsistent, acting [3,0,6] pg 3.3d is active+clean+inconsistent, acting [10,7,12] pg 3.50 is active+clean+inconsistent, acting [7,5,0] pg 3.5a is active+clean+inconsistent, acting [14,10,17] pg 3.5f is active+clean+inconsistent, acting [4,2,16] pg 3.7e is active+clean+inconsistent, acting [9,16,5] pg 3.85 is active+clean+inconsistent, acting [4,16,1] pg 3.89 is active+clean+inconsistent, acting [9,7,3] pg 3.8d is active+clean+inconsistent, acting [10,12,8] pg 3.90 is active+clean+inconsistent, acting [8,10,12] pg 3.a8 is active+clean+inconsistent, acting [7,11,14] pg 3.aa is active+clean+inconsistent, acting [1,3,7] pg 3.b4 is active+clean+inconsistent, acting [9,17,13] pg 3.ce is active+clean+inconsistent, acting [3,15,1] pg 3.d8 is active+clean+inconsistent, acting [12,9,8] pg 3.dc is active+clean+inconsistent, acting [0,5,15] pg 3.102 is active+clean+inconsistent, acting [13,2,6] pg 3.11e is active+clean+inconsistent, acting [11,13,16] pg 3.120 is active+clean+inconsistent, acting [0,13,16] pg 3.121 is active+clean+inconsistent, acting [1,16,4] pg 3.12c is active+clean+inconsistent, acting [11,4,8] pg 3.149 is active+clean+inconsistent, acting [12,0,8] pg 3.16a is active+clean+inconsistent, acting [3,11,16] pg 3.16e is active+clean+inconsistent, acting [14,0,8] pg 3.170 is active+clean+inconsistent, acting [10,14,17] pg 3.176 is active+clean+inconsistent, acting [9,7,12] pg 3.18d is active+clean+inconsistent, acting [4,11,15] pg 3.1a7 is active+clean+inconsistent, acting [11,8,4] pg 3.1a8 is active+clean+inconsistent, acting [7,2,14] pg 3.1a9 is active+clean+inconsistent, acting [2,12,16] pg 3.1ac is active+clean+inconsistent, acting [2,17,13] pg 3.1c2 is active+clean+inconsistent, acting [7,2,5] pg 3.1c9 is active+clean+inconsistent, acting [16,10,4] pg 3.1d5 is active+clean+inconsistent, acting [0,4,8] pg 3.1d6 is active+clean+inconsistent, acting [3,0,17] pg 3.1d7 is active+clean+inconsistent, acting [13,10,16] pg 3.1dd is active+clean+inconsistent, acting [14,2,8]
i.e. 21 new inconsistent PGs after this nights deep scrub runs.
#4 Updated by Anton Neubauer over 5 years ago
I have the same issue
#5 Updated by Brad Hubbard over 5 years ago
In the case of pg 2.34 above where the only error is "data_digest_mismatch_info" and all the data digests except the one in the selected_object_info are the same you should be able to resolve it with the following procedure.
1. rados -p [name_of_pool_2] setomapval rbd_data.4048d8238e1f29.00000000000002e6 temporary-key anything
2. ceph pg deep-scrub 2.34
3. Wait for the scrub to finish
4. rados -p [name_of_pool_2] rmomapkey rbd_data.4048d8238e1f29.00000000000002e6 temporary-key
This should work on any pg that satisfies the criteria above. If you have pgs with different errors, such as "data_digest_mismatch" (not "data_digest_mismatch_info") post the list-inconsistent-obj output here. If you are getting the "No such file or directory" error try completing a scrub specifically on that pg before issuing the command.
#6 Updated by Robert Sander over 5 years ago
Brad Hubbard wrote:
1. rados -p [name_of_pool_2] setomapval rbd_data.4048d8238e1f29.00000000000002e6 temporary-key anything
2. ceph pg deep-scrub 2.34
3. Wait for the scrub to finish
4. rados -p [name_of_pool_2] rmomapkey rbd_data.4048d8238e1f29.00000000000002e6 temporary-key
I have applied this procedure on a test cluster with the same issue without any luck:
root@ceph05:/var/log/ceph# ceph health detail HEALTH_ERR 1 filesystem is degraded; 2 mds daemons damaged; noout flag(s) set; 6 scrub errors; Possible data damage: 2 pgs inconsistent FS_DEGRADED 1 filesystem is degraded fs cephfs is degraded MDS_DAMAGE 2 mds daemons damaged fs cephfs mds.0 is damaged fs cephfs mds.1 is damaged OSDMAP_FLAGS noout flag(s) set OSD_SCRUB_ERRORS 6 scrub errors PG_DAMAGED Possible data damage: 2 pgs inconsistent pg 2.14 is active+clean+inconsistent, acting [2,4,0] pg 2.17 is active+clean+inconsistent, acting [6,5,2] root@ceph05:/var/log/ceph# rados list-inconsistent-obj 2.14 --format=json-pretty { "epoch": 702, "inconsistents": [ { "object": { "name": "200.00000000", "nspace": "", "locator": "", "snap": "head", "version": 83 }, "errors": [], "union_shard_errors": [ "data_digest_mismatch_info" ], "selected_object_info": { "oid": { "oid": "200.00000000", "key": "", "snapid": -2, "hash": 2219783316, "max": 0, "pool": 2, "namespace": "" }, "version": "704'83", "prior_version": "704'82", "last_reqid": "client.11074684.0:1", "user_version": 83, "size": 90, "mtime": "2018-07-23 10:06:13.458068", "local_mtime": "2018-07-23 10:06:13.461844", "lost": 0, "flags": [ "dirty", "omap", "data_digest" ], "legacy_snaps": [], "truncate_seq": 0, "truncate_size": 0, "data_digest": "0x2e078a4f", "omap_digest": "0xffffffff", "expected_object_size": 0, "expected_write_size": 0, "alloc_hint_flags": 0, "manifest": { "type": 0, "redirect_target": { "oid": "", "key": "", "snapid": 0, "hash": 0, "max": 0, "pool": -9223372036854775808, "namespace": "" } }, "watchers": {} }, "shards": [ { "osd": 0, "primary": false, "errors": [ "data_digest_mismatch_info" ], "size": 90, "omap_digest": "0xffffffff", "data_digest": "0x073cc8d6" }, { "osd": 2, "primary": true, "errors": [ "data_digest_mismatch_info" ], "size": 90, "omap_digest": "0xffffffff", "data_digest": "0x073cc8d6" }, { "osd": 4, "primary": false, "errors": [ "data_digest_mismatch_info" ], "size": 90, "omap_digest": "0xffffffff", "data_digest": "0x073cc8d6" } ] } ] } root@ceph05:/var/log/ceph# rados -p cephfs_metadata setomapval "200.00000000" temporary-key abcdef root@ceph05:/var/log/ceph# ceph pg deep-scrub 2.14 instructing pg 2.14 on osd.2 to deep-scrub
The logfile then contains:
2018-07-23 10:05:36.169693 osd.2 [ERR] 2.14 shard 0: soid 2:292cf221:::200.00000000:head data_digest 0x73cc8d6 != data_digest 0x2e078a4f from auth oi 2:292cf221:::200.00000000:head(704'82 client.11064723.0:1 dirty|omap|data_digest s 90 uv 82 dd 2e078a4f alloc_hint [0 0 0]) 2018-07-23 10:05:36.169696 osd.2 [ERR] 2.14 shard 2: soid 2:292cf221:::200.00000000:head data_digest 0x73cc8d6 != data_digest 0x2e078a4f from auth oi 2:292cf221:::200.00000000:head(704'82 client.11064723.0:1 dirty|omap|data_digest s 90 uv 82 dd 2e078a4f alloc_hint [0 0 0]) 2018-07-23 10:05:36.169704 osd.2 [ERR] 2.14 shard 4: soid 2:292cf221:::200.00000000:head data_digest 0x73cc8d6 != data_digest 0x2e078a4f from auth oi 2:292cf221:::200.00000000:head(704'82 client.11064723.0:1 dirty|omap|data_digest s 90 uv 82 dd 2e078a4f alloc_hint [0 0 0]) 2018-07-23 10:05:36.169706 osd.2 [ERR] 2.14 soid 2:292cf221:::200.00000000:head: failed to pick suitable auth object 2018-07-23 10:05:36.169842 osd.2 [ERR] 2.14 deep-scrub 3 errors
The test cluster has "osd distrust data digest = true" as it has a mixture of BlueStore and FileStore OSDs.
#7 Updated by Brad Hubbard over 5 years ago
Oops, my mistake, terribly sorry. I gave you the procedure for an omap_digest_mismatch_info error.
For the data_digest_mismatch_info error with client activity stopped, read the data from this object and write it again using rados get then rados put. Sorry about my mixing these two up.
#8 Updated by Robert Sander over 5 years ago
Brad Hubbard wrote:
For the data_digest_mismatch_info error with client activity stopped, read the data from this object and write it again using rados get then rados put. Sorry about my mixing these two up.
This worked. I stopped all MDS services to stop client IO on this pool.
As this was the cephfs_metadata pool I had to tell the MDS's that they have been repaired after that with "cph mds repaired <mdsid>".
On the production cluster the RBD pool is affected. Do I really need to stop the VMs and do the "get/put" repair or will the issue resolve itself when the VM does IO on the affected objects?
#9 Updated by Brad Hubbard over 5 years ago
Robert Sander wrote:
On the production cluster the RBD pool is affected. Do I really need to stop the VMs and do the "get/put" repair or will the issue resolve itself when the VM does IO on the affected objects?
You would only need to stop the VM using the image that the object belongs to and yes, as is documented, and as you have already been advised "These warnings are harmless in the sense that IO is not affected and the replicas are all still in sync. The number
of affected objects is likely to drop (possibly to zero) on their own over time as those objects are modified" I was under the impression you opened this tracker to get a more immediate solution.