Project

General

Profile

Actions

Bug #53663

closed

Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools

Added by Christian Rohmann over 2 years ago. Updated about 2 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On a 4 node Octopus cluster I am randomly seeing batches of scrub errors, as in:

# ceph health detail

HEALTH_ERR 7 scrub errors; Possible data damage: 6 pgs inconsistent
[ERR] OSD_SCRUB_ERRORS: 7 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 6 pgs inconsistent
    pg 5.3 is active+clean+inconsistent, acting [9,12,6]
    pg 5.4 is active+clean+inconsistent, acting [15,17,18]
    pg 7.2 is active+clean+inconsistent, acting [13,15,10]
    pg 7.9 is active+clean+inconsistent, acting [5,19,4]
    pg 7.e is active+clean+inconsistent, acting [1,15,20]
    pg 7.18 is active+clean+inconsistent, acting [5,10,0]

The cluster was setup straight with Octopus, no upgrades from a previous release.
Also it is only serving traffic via RADOSGW and it's a multisite setup with this cluster being the zone master.

The scrub errors seem to occur in two distinct pools only:

# rados list-inconsistent-pg $pool

Pool: zone.rgw.log
["5.3","5.4"]

Pool: zone.rgw.buckets.index
["7.2","7.9","7.e","7.18"]

but they are spread across different OSD and hosts:

# ceph osd tree
ID  CLASS  WEIGHT     TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
-1         363.23254  root default
-3          90.80814      host host-01
 1    hdd   15.13469          osd.1                 up   1.00000  1.00000
 5    hdd   15.13469          osd.5                 up   1.00000  1.00000
 9    hdd   15.13469          osd.9                 up   1.00000  1.00000
13    hdd   15.13469          osd.13                up   1.00000  1.00000
17    hdd   15.13469          osd.17                up   1.00000  1.00000
21    hdd   15.13469          osd.21                up   1.00000  1.00000
-5          90.80814      host host-02
 0    hdd   15.13469          osd.0                 up   1.00000  1.00000
 4    hdd   15.13469          osd.4                 up   1.00000  1.00000
 8    hdd   15.13469          osd.8                 up   1.00000  1.00000
12    hdd   15.13469          osd.12                up   1.00000  1.00000
16    hdd   15.13469          osd.16                up   1.00000  1.00000
20    hdd   15.13469          osd.20                up   1.00000  1.00000
-9          90.80814      host host-03
 2    hdd   15.13469          osd.2                 up   1.00000  1.00000
 6    hdd   15.13469          osd.6                 up   1.00000  1.00000
10    hdd   15.13469          osd.10                up   1.00000  1.00000
14    hdd   15.13469          osd.14                up   1.00000  1.00000
18    hdd   15.13469          osd.18                up   1.00000  1.00000
23    hdd   15.13469          osd.23                up   1.00000  1.00000
-7          90.80814      host host-04
 3    hdd   15.13469          osd.3                 up   1.00000  1.00000
 7    hdd   15.13469          osd.7                 up   1.00000  1.00000
11    hdd   15.13469          osd.11                up   1.00000  1.00000
15    hdd   15.13469          osd.15                up   1.00000  1.00000
19    hdd   15.13469          osd.19                up   1.00000  1.00000
22    hdd   15.13469          osd.22                up   1.00000  1.00000

(Just as a side note: Each host has a single NVME journal disk shared via LVM to all 6 spinning rust OSDs.)

Even though there seems to be a host with more errors on its OSDs, so likely something like bad hardware, it simply was a different host with multiple of its OSDs having inconsistent pgs. BTW, those were fixed via a call of pg repair).

I attached the list-inconsistencies output of all the PG, they all report omap_digest_mismatch with either a bucket index or a datalog file. I cannot recall that past errors were ever about bucket data itself.

At few days ago I triggered a deep scrub of all OSD on one host which came back clean and a day later that host (-02) reported multiple errors again.

The cluster also went through a upgrade of Ubuntu Bionic to Ubuntu Focal (keeping Ceph at 15.2.15 and all of the OSDs), but the randomly occurring scrub errors remained. So an issue with a particular OS / kernel version could be softly ruled out as well.

Please excuse my initial selection of severity critical for this bug report. But this seems rather unlikely to be a simple hardware issue as a broken disk and more of a silent corruption issue of RADOSGW metadata. Also there should be no simple config glitch I could have made causing this.


Files

list-inconsistences.tar (30 KB) list-inconsistences.tar Details on the found inconsistencies Christian Rohmann, 12/19/2021 12:42 AM

Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #54592: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirtyResolvedNeha Ojha

Actions
Actions #2

Updated by Christian Rohmann over 2 years ago

The only "special" settings I can think of are

bluestore min alloc size = 4096
bluestore min alloc size hdd = 4096
bluestore min alloc size ssd = 4096

to save on small objects being uploaded to the object storage.

Actions #4

Updated by Neha Ojha about 2 years ago

  • Project changed from Ceph to RADOS

Are you using filestore or bluestore?

Actions #5

Updated by Christian Rohmann about 2 years ago

Neha Ojha wrote:

Are you using filestore or bluestore?

On bluestore

Actions #6

Updated by Christian Rohmann about 2 years ago

The issue is still happening:

1) Find all pools with scrub errors via

$ for pool in $(rados lspools); do echo "${pool} $(rados list-inconsistent-pg ${pool})"; done

 device_health_metrics []
 .rgw.root []
 zone.rgw.control []
 zone.rgw.meta []
 zone.rgw.log ["5.3","5.5","5.a","5.b","5.10","5.11","5.19","5.1a","5.1d","5.1e"]
 zone.rgw.otp []
 zone.rgw.buckets.index ["7.4","7.5","7.6","7.9","7.b","7.11","7.13","7.14","7.18","7.1e"]
 zone.rgw.buckets.data []
 zone.rgw.buckets.non-ec []


(This is from now) and you can see how only metadata pools are actually affected.

2) I then simply looped over the pgs with "rados list-inconsistent-obj $pg" and this is the object.name, errors and last_reqid:

 "data_log.14","omap_digest_mismatch","client.4349063.0:12045734" 
 "data_log.59","omap_digest_mismatch","client.4364800.0:11773451" 
 "data_log.30","omap_digest_mismatch","client.4349063.0:10935030" 
 "data_log.42","omap_digest_mismatch","client.4348139.0:112695680" 
 "data_log.63","omap_digest_mismatch","client.4348139.0:116876563" 
 "data_log.44","omap_digest_mismatch","client.4349063.0:11358410" 
 "data_log.11","omap_digest_mismatch","client.4349063.0:10259566" 
 "data_log.61","omap_digest_mismatch","client.4349063.0:10259594" 
 "data_log.28","omap_digest_mismatch","client.4349063.0:11358396" 
 "data_log.39","omap_digest_mismatch","client.4349063.0:11364174" 
 "data_log.55","omap_digest_mismatch","client.4349063.0:11358415" 
 "data_log.15","omap_digest_mismatch","client.4364800.0:9518143" 
 "data_log.27","omap_digest_mismatch","client.4349063.0:11473205" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.6","omap_digest_mismatch","client.4349063.0:11274164" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.1","omap_digest_mismatch","client.4349063.0:12168097" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.10","omap_digest_mismatch","client.4348139.0:112993744" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2202949.678.0","omap_digest_mismatch","client.4349063.0:10289913" 
 ".dir.9cba42a3-dd1c-46d4-bdd2-ef634d12c0a5.56337947.1562","omap_digest_mismatch","client.4364800.0:10934595" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.9","omap_digest_mismatch","client.4349063.0:10431941" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.0","omap_digest_mismatch","client.4349063.0:10431932" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2202949.678.10","omap_digest_mismatch","client.4349063.0:10460106" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.8","omap_digest_mismatch","client.4349063.0:11696943" 
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.0","omap_digest_mismatch","client.4349063.0:9845513" 
 ".dir.9cba42a3-dd1c-46d4-bdd2-ef634d12c0a5.61963196.333.1","omap_digest_mismatch","client.4364800.0:9593089" 

As you can see, it's always some omap data that suffers from inconsistencies.

Actions #7

Updated by Neha Ojha about 2 years ago

  • Status changed from New to Need More Info

Is it possible for you to trigger a deep-scrub on one PG (with debug_osd=20,debug_ms=1), let it go into inconsistent and share osd logs with us? This may take a bit of work, but hopefully will get us closer to the root cause.

Actions #8

Updated by Christian Rohmann about 2 years ago

Massive thanks for your reply Neha, I greatly appreciate it!

Neha Ojha wrote:

Is it possible for you to trigger a deep-scrub on one PG (with debug_osd=20,debug_ms=1), let it go into inconsistent and share osd logs with us? This may take a bit of work, but hopefully will get us closer to the root cause.

Certainly, anything you require to dig into this!

Should I deep scrub an OSD I already know has an inconsistency reported or one that does not until I find an inconsistency?

Actions #9

Updated by Christian Rohmann about 2 years ago

Neha - I did upload the logs of a deep-scrub via ceph-post-file: 1e5ff0f8-9b76-4489-8529-ee5e6f246093
There is a little text file with the exact steps I took.

Please let me know if you need any more info or if there is any other debug data I could gather.

Actions #10

Updated by Christian Rohmann about 2 years ago

I did run a manual deep-scrub on another inconsistent PG as well, you'll find the logs of all OSDs handling this PG in:

ceph-post-file: 31d72f03-197c-48a7-a94c-f3575ae865f1

Actions #11

Updated by yite gu about 2 years ago

Can you show me that primary osd log report when happen deep-scrub error?
I hope to know which osd shard happend error

Actions #12

Updated by Christian Rohmann about 2 years ago

yite gu wrote:

Can you show me that primary osd log report when happen deep-scrub error?
I hope to know which osd shard happend error

I put the logs of all the OSDs handling each PG, primary and replicas, in the tar files I uploaded.
If you let me know which OSD you want the logs of I surely can get more.

I also did not repair the scrub errors yet - in case you want another deep-scrub of those.

Actions #13

Updated by Dieter Roels about 2 years ago

Not sure if this helps or not, but we are experiencing very similar issues in our clusters the last few days.

We are on RHCS5 (16.2.0-146.el8cp) with multisite and ran into a bug with copyObject crashing our rgws. We bypassed this by reverting to an older, unaffected rgw version (16.2.0-117.el8cp). A few days later deep-scrubs found a lot of inconsist PGs. It looked exactly like this issue. All inconsistent PGs were omap_digest_mismatch from rgw.log and rgw.buckets.index pools (data_log and .dirs)

All inconsistencies were on non-primary shards, so we repaired them with pg repair.

After the repair the inconsistencies do not re-appear. However, we can reproduce the issue in our test environment by running the "bad" rgws and let them crash a few times. After the rgw crashes the inconsistent PGs start to appear again. So the cause for the inconsistent PGs was the rgws segfaulting in our case.

Actions #14

Updated by yite gu about 2 years ago

"shards": [ {
"osd": 10,
"primary": false,
"errors": [

],
"size": 0,
"omap_digest": "0xb741106f",
"data_digest": "0xffffffff"
}, {
"osd": 13,
"primary": true,
"errors": [ {
"osd": 15,
"primary": false,
"errors": [

],
"size": 0,
"omap_digest": "0x432dde9e",
"data_digest": "0xffffffff"
}
]
This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can check osd.10 is there error log report when running deep-scrub pg 7.2

Actions #15

Updated by yite gu about 2 years ago

yite gu wrote:

"shards": [ {
"osd": 10,
"primary": false,
"errors": [

],
"size": 0,
"omap_digest": "0xb741106f",
"data_digest": "0xffffffff"
}, {
"osd": 13,
"primary": true,
"errors": [

],
"size": 0,
"omap_digest": "0x432dde9e",
"data_digest": "0xffffffff"
}, {
"osd": 15,
"primary": false,
"errors": [

],
"size": 0,
"omap_digest": "0x432dde9e",
"data_digest": "0xffffffff"
}
]
This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can check osd.10 is there error log report when running deep-scrub pg 7.2

sorry, there is json format display error. you should know what I mean.

Actions #16

Updated by Christian Rohmann about 2 years ago

yite gu wrote:

This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can check osd.10 is there error log report when running deep-scrub pg 7.2

In my previous uploads there should be pg 5.14 and 7.3 which both have errors and which I both manually deep-scrubbed during debug log level.

Certainly there could be other (automatic) scrubs in there. But currently there is no inconsistency on pg 7.2. Sorry if there is any confusion.

I now ran a deep-scrub on 7.2 and 7.3 again and uploaded all the OSD debugs logs to:
ceph-post-file: 610897bb-d92c-42a1-98f4-f90e90696ed1

If you require me to run anything else, please let me know.

Actions #17

Updated by yite gu about 2 years ago

Christian Rohmann wrote:

yite gu wrote:

This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can check osd.10 is there error log report when running deep-scrub pg 7.2

In my previous uploads there should be pg 5.14 and 7.3 which both have errors and which I both manually deep-scrubbed during debug log level.

Certainly there could be other (automatic) scrubs in there. But currently there is no inconsistency on pg 7.2. Sorry if there is any confusion.

I now ran a deep-scrub on 7.2 and 7.3 again and uploaded all the OSD debugs logs to:
ceph-post-file: 610897bb-d92c-42a1-98f4-f90e90696ed1

If you require me to run anything else, please let me know.

sorry, I don't know how to get ceph-post-file: 610897bb-d92c-42a1-98f4-f90e90696ed1
:(

Actions #18

Updated by Christian Rohmann about 2 years ago

Dieter Roels wrote:

Not sure if this helps or not, but we are experiencing very similar issues in our clusters the last few days.

We are on RHCS5 (16.2.0-146.el8cp) with multisite and ran into a bug with copyObject crashing our rgws. We bypassed this by reverting to an older, unaffected rgw version (16.2.0-117.el8cp). A few days later deep-scrubs found a lot of inconsist PGs. It looked exactly like this issue. All inconsistent PGs were omap_digest_mismatch from rgw.log and rgw.buckets.index pools (data_log and .dirs)

All inconsistencies were on non-primary shards, so we repaired them with pg repair.

Good observation, I shall be looking closer at the inconsistencies in my clusters to see if this is the case for us as well.

After the repair the inconsistencies do not re-appear. However, we can reproduce the issue in our test environment by running the "bad" rgws and let them crash a few times. After the rgw crashes the inconsistent PGs start to appear again. So the cause for the inconsistent PGs was the rgws segfaulting in our case.

This sounds similar to what we expect the issue to be. We did trigger a deep-scrub on ALL osds on both clusters with zero errors or inconsistencies. Then a week or two later the inconsistencies started to appear again. I am quite certain there were a few restarts of RADOSGWs happening in that time frame as well.

Dieter, could you maybe describe your test setup a little more? How many instances of RADOSGW?
How is the multi-site sync set up? ...

Actions #19

Updated by Christian Rohmann about 2 years ago

Christian Rohmann wrote:

Dieter Roels wrote:

All inconsistencies were on non-primary shards, so we repaired them with pg repair.

Good observation, I shall be looking closer at the inconsistencies in my clusters to see if this is the case for us as well.

I went through all the inconsistencies and YES, they are all "only" affecting a single non-primary shard, never the primary.

Actions #20

Updated by Dieter Roels about 2 years ago

Christian Rohmann wrote:

Dieter, could you maybe describe your test setup a little more? How many instances of RADOSGW?
How is the multi-site sync set up? ...

Sure. A multisite between 2 datacenters. Each cluster has 4 rgws. Both zones are read/write. The metadata master had by far the most inconsistent PGs but it is also used more so I'm pretty sure it experienced more rgw segfaults.

Actions #21

Updated by Dieter Roels about 2 years ago

Dieter Roels wrote:

After the repair the inconsistencies do not re-appear. However, we can reproduce the issue in our test environment by running the "bad" rgws and let them crash a few times. After the rgw crashes the inconsistent PGs start to appear again. So the cause for the inconsistent PGs was the rgws segfaulting in our case.

Update: this was not correct. We now have new inconsistencies in these clusters, even with stable rgws. So the segfaults were not the cause, it also happens in normal operation. We are trying to find exactly when it happens but it is difficult because deep-scrubs do not always seem to happen immediately when the command is given. Sometimes the deep-scrub only start after restarting the osd it seems. We are currently thinking it has to do with restarting the rgws but not sure yet...

Actions #22

Updated by Neha Ojha about 2 years ago

  • Status changed from Need More Info to New
  • Assignee set to Ronen Friedman
Actions #23

Updated by Neha Ojha about 2 years ago

yite gu wrote:

Christian Rohmann wrote:

yite gu wrote:

This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can check osd.10 is there error log report when running deep-scrub pg 7.2

In my previous uploads there should be pg 5.14 and 7.3 which both have errors and which I both manually deep-scrubbed during debug log level.

Certainly there could be other (automatic) scrubs in there. But currently there is no inconsistency on pg 7.2. Sorry if there is any confusion.

I now ran a deep-scrub on 7.2 and 7.3 again and uploaded all the OSD debugs logs to:
ceph-post-file: 610897bb-d92c-42a1-98f4-f90e90696ed1

If you require me to run anything else, please let me know.

sorry, I don't know how to get ceph-post-file: 610897bb-d92c-42a1-98f4-f90e90696ed1
:(

You need to have sepia lab access to view these.

Actions #24

Updated by yite gu about 2 years ago

Neha Ojha wrote:

yite gu wrote:

Christian Rohmann wrote:

yite gu wrote:

This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can check osd.10 is there error log report when running deep-scrub pg 7.2

In my previous uploads there should be pg 5.14 and 7.3 which both have errors and which I both manually deep-scrubbed during debug log level.

Certainly there could be other (automatic) scrubs in there. But currently there is no inconsistency on pg 7.2. Sorry if there is any confusion.

I now ran a deep-scrub on 7.2 and 7.3 again and uploaded all the OSD debugs logs to:
ceph-post-file: 610897bb-d92c-42a1-98f4-f90e90696ed1

If you require me to run anything else, please let me know.

sorry, I don't know how to get ceph-post-file: 610897bb-d92c-42a1-98f4-f90e90696ed1
:(

You need to have sepia lab access to view these.

Can this access be open to me?

Actions #25

Updated by Christian Rohmann about 2 years ago

We just observed 12 more scrub errors spread across 7 pgs and all on our primary (used for user access, read/write) zone, but none in the replication site this time:

"data_log.29","omap_digest_mismatch","client.4904338.0:25919889" 
"data_log.62","omap_digest_mismatch","client.4904338.0:26433643" 
"data_log.11","omap_digest_mismatch","client.4904338.0:21002771" 
"data_log.61","omap_digest_mismatch","client.4918247.0:3799452" 
"data_log.7","omap_digest_mismatch","client.4904338.0:9642523" 
"data_log.9","omap_digest_mismatch","client.4904338.0:9642525" 
"data_log.36","omap_digest_mismatch","client.4918247.0:1949287" 
"data_log.52","omap_digest_mismatch","client.4904338.0:8553528" 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.4","omap_digest_mismatch","client.4904338.0:23913511" 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.2","omap_digest_mismatch","client.4918247.0:3170210" 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.4349087.1038.1","omap_digest_mismatch","client.4904338.0:14556155" 
".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2202949.678.7","omap_digest_mismatch","client.4934830.0:2559601" 

But again there seems to be no strict correlation between restarts, crashes or any other influence of cluster options and the inconsistencies.
I suppose with more ideas about the cause any new occurrence is valuable to use for analysis.
Please let me know if there is any more debug info I should gather prior to letting those inconsistent pgs repair.

Actions #26

Updated by Dieter Roels about 2 years ago

Hi CHristian. Are your rgws collocated with the osds of the metadata pools?

We now notice in our clusters that the inconsistent shard is always on an osd on a node that was recently restarted. I'm currently thinking that it is caused by some kind of race condition between the rgw and the osd that are being restarted at the same time.

Actions #27

Updated by Christian Rohmann about 2 years ago

Dieter Roels wrote:

Hi CHristian. Are your rgws collocated with the osds of the metadata pools?
We now notice in our clusters that the inconsistent shard is always on an osd on a node that was recently restarted. I'm currently thinking that it is caused by some kind of race condition between the rgw and the osd that are being restarted at the same time.

Yes, we run our RGW instances colocated on machines with OSDs as well.

Thanks for keeping at this issue with us. It's certainly not something that affects every RGW (multi-site) installation, but still scary to have random inconsistencies.

Actions #28

Updated by Dieter Roels about 2 years ago

We were able to narrow it down further. We can trigger the problem reliably by doing this:

- 2 clusters, multisite, read/write with metadata activity (active users and applications, no idle lab environments)
- deep-scrub all pgs of rgw.log and buckets.index pools to make sure everything is clean
- stop one of the osds that has pgs from the rgw.log and buckets.index pools
- wait 10 minutes
- start osd again
- deep-scrub all pgs from gw.log and buckets.index pools -> lots of inconsistencies found

Just restarting an OSD does not trigger the issue, it has to be offline for a while. Sometimes it finds one or 2 inconsistent pgs, but sometimes almost all of the pgs of the metadata pools are inconsistent. I guess this depends on the amount of activity on the cluster.

Actions #29

Updated by Neha Ojha about 2 years ago

  • Priority changed from Normal to High
Actions #30

Updated by Christian Rohmann about 2 years ago

This issue is still present, and also with 15.2.16.

I just observed that after a series of machine reboots due to regular kernel / packages updates a spike of inconsistencies was observed with more than 37 inconsistencies found all of a sudde.

@Neha . Ojha: Is there anything further one can do to help finding the root cause of this issue here?

Actions #31

Updated by Vikhyat Umrao about 2 years ago

Christian Rohmann wrote:

This issue is still present, and also with 15.2.16.

I just observed that after a series of machine reboots due to regular kernel / packages updates a spike of inconsistencies was observed with more than 37 inconsistencies found all of a sudde.

@Neha . Ojha: Is there anything further one can do to help finding the root cause of this issue here?

This looks to be related to https://tracker.ceph.com/issues/54592. @Neha . I think my understanding is correct?

Actions #32

Updated by Vikhyat Umrao about 2 years ago

  • Related to Bug #54592: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty added
Actions #33

Updated by Neha Ojha about 2 years ago

Vikhyat Umrao wrote:

Christian Rohmann wrote:

This issue is still present, and also with 15.2.16.

I just observed that after a series of machine reboots due to regular kernel / packages updates a spike of inconsistencies was observed with more than 37 inconsistencies found all of a sudde.

@Neha . Ojha: Is there anything further one can do to help finding the root cause of this issue here?

This looks to be related to https://tracker.ceph.com/issues/54592. @Neha . I think my understanding is correct?

Yes, this looks very similar! I will take a look at the logs to confirm.

Actions #34

Updated by Neha Ojha about 2 years ago

Christian Rohmann wrote:

This issue is still present, and also with 15.2.16.

I just observed that after a series of machine reboots due to regular kernel / packages updates a spike of inconsistencies was observed with more than 37 inconsistencies found all of a sudde.

@Neha . Ojha: Is there anything further one can do to help finding the root cause of this issue here?

The fact the issue only happens in replicated pools, makes me think you are running into https://tracker.ceph.com/issues/54592, because partial recovery is only applicable to replicated pools, in Octopus and later versions. Do you happen to know if there was recovery on these PGs before they became inconsistent? In https://tracker.ceph.com/issues/54592, the sequence of events that led to inconsistent PGs was a rgw.bi_log_trim operation, which translated into omap-rm-key-range, while an osd was down, followed by recovery. From the logs, you have provided, I cannot track down the recovery bit, but the symptoms look very similar. https://github.com/ceph/ceph/pull/45466 is the fix, we also have a pacific backport open for it.

Actions #35

Updated by Vikhyat Umrao about 2 years ago

Neha Ojha wrote:

Christian Rohmann wrote:

This issue is still present, and also with 15.2.16.

I just observed that after a series of machine reboots due to regular kernel / packages updates a spike of inconsistencies was observed with more than 37 inconsistencies found all of a sudde.

@Neha . Ojha: Is there anything further one can do to help finding the root cause of this issue here?

The fact the issue only happens in replicated pools, makes me think you are running into https://tracker.ceph.com/issues/54592, because partial
recovery is only applicable to replicated pools, in Octopus and later versions. Do you happen to know if there was recovery on these PGs before they
became inconsistent? In https://tracker.ceph.com/issues/54592, the sequence of events that led to inconsistent PGs was a rgw.bi_log_trim operation,
which translated into omap-rm-key-range, while an osd was down, followed by recovery.

Neha, I think it is the exact issue. Because in this tracker description it is mentioned it is RGW Multisite setup. I am quoting below:

Also it is only serving traffic via RADOSGW and it's a multisite setup with this cluster being the zone master.

From the logs, you have provided, I cannot track down the recovery bit, but the symptoms look very similar. https://github.com/ceph/ceph/pull/45466 is > the fix, we also have a pacific backport open for it.

Actions #36

Updated by Vikhyat Umrao about 2 years ago

I think this can be marked as a duplicate of 54592.

Actions #37

Updated by Neha Ojha about 2 years ago

  • Status changed from New to Duplicate
  • Assignee deleted (Ronen Friedman)

Marking this as a duplicate based on above comments.

Actions #38

Updated by Neha Ojha about 2 years ago

  • Related to deleted (Bug #54592: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty)
Actions #39

Updated by Neha Ojha about 2 years ago

  • Is duplicate of Bug #54592: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty added
Actions

Also available in: Atom PDF