Bug #62256: rgw: Bucket listing hangs against object with 11111 versions - rgw - Ceph

Actions

Copy link

Bug #62256

closed

rgw: Bucket listing hangs against object with 11111 versions

Added by Joshua Baergen 10 months ago. Updated 9 months ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

J. Eric Ivancich

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Yes

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v16.2.13

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Since upgrading from 14.2.18 to 16.2.13 (also seen in 16.2.11), one bucket can no longer successfully list as it infinitely loops over a subset of versions of one object that has 11111 versions per "bi list".

Issuing "bucket list" against the bucket yields:
2023-07-31T22:53:53.443+0000 7f01269eb4c0 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=<redacted>/MJhXD9tjk7ZNmGbF/0022026979, cur_marker=<redacted>/MJhXD9tjk7ZNmGbF/0022026979

And then as mentioned above it loops over those keys. On the OSD I see:
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:555: rgw_bucket_list: on attempt 0 get_obj_vls returned 16 entries, more=1
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:574: rgw_bucket_list: working on key=<redacted>/MJhXD9tjk7ZNmGbF/0022026979 len=53
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:595: rgw_bucket_list: entry <redacted>/MJhXD9tjk7ZNmGbF/0022026979[] is not visible
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:574: rgw_bucket_list: working on key=<redacted>/MJhXD9tjk7ZNmGbF/0022026979 len=53
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:595: rgw_bucket_list: entry <redacted>/MJhXD9tjk7ZNmGbF/0022026979[] is not visible
...
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:555: rgw_bucket_list: on attempt 7 get_obj_vls returned 16 entries, more=1
...
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:574: rgw_bucket_list: working on key=<redacted>/MJhXD9tjk7ZNmGbF/0022026979 len=53
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:595: rgw_bucket_list: entry <redacted>/MJhXD9tjk7ZNmGbF/0022026979[] is not visible
2023-07-31T22:34:08.542+0000 7fb1efa19700 20 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:659: rgw_bucket_list: normal exit returning 0 entries, is_truncated=1
2023-07-31T22:34:08.542+0000 7fb1efa19700 5 <cls> <redacted>/src/cls/rgw/cls_rgw.cc:663: rgw_bucket_list: returning value RGWBIAdvanceAndRetryError

Because this is a live production system with customer data, I may be limited in what further information I can provide, but let me know if you need anything more.

Files

0022026979.objects.gz (309 KB) 0022026979.objects.gz

Filtered "bi list"

Joshua Baergen, 08/05/2023 12:38 PM

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by J. Eric Ivancich 10 months ago

Assignee set to J. Eric Ivancich

Actions

Copy link

Updated by Joshua Baergen 10 months ago

Here's a sample of keys from "bi list", in case that helps:
"type": "plain",
"idx": "<redacted>/MJhXD9tjk7ZNmGbF/0022026979\u0000v54294895183\u0000i",
"entry": {
"name": "<redacted>/MJhXD9tjk7ZNmGbF/0022026979",
"instance": "",

"type": "plain",
        "idx": "&lt;redacted&gt;/MJhXD9tjk7ZNmGbF/0022026979\u0000v54294895184\u0000i",
        "entry": {
            "name": "&lt;redacted&gt;/MJhXD9tjk7ZNmGbF/0022026979",
            "instance": "",
            "ver": {
                "pool": 2,
                "epoch": 4390758
            },

Actions

Copy link

Updated by Joshua Baergen 10 months ago

Sorry, trying that again with better formatting:

        "type": "plain",
        "idx": "<redacted>/MJhXD9tjk7ZNmGbF/0022026979\u0000v54294895183\u0000i",
        "entry": {
            "name": "<redacted>/MJhXD9tjk7ZNmGbF/0022026979",
            "instance": "",
            "ver": {
                "pool": 2,
                "epoch": 4390765
            },

        "type": "plain",
        "idx": "<redacted>/MJhXD9tjk7ZNmGbF/0022026979\u0000v54294895184\u0000i",
        "entry": {
            "name": "<redacted>/MJhXD9tjk7ZNmGbF/0022026979",
            "instance": "",
            "ver": {
                "pool": 2,
                "epoch": 4390758
            },

Actions

Copy link

Updated by Joshua Baergen 10 months ago

After digging through the code yesterday and some experimentation, I thinking that it's actually weird to have multiple of these versions without instance IDs. Is it the case there there should only be one version without an instance ID that gets replaced on PUT?

The continuation of the list code depends on unique instance IDs and thus the massive proliferation of object versions that don't have instances is what's causing listing to loop/hang. If the state above is indeed erroneous, then one way to clean up this bucket would be to rmomapkey all of the versions except the newest.

Actions

Copy link

Updated by J. Eric Ivancich 10 months ago

Status changed from New to In Progress

Actions

Copy link

Updated by J. Eric Ivancich 10 months ago

So far I'm unable to reproduce this.

I have a versioned bucket that has the initial 11 shards and that has 25 objects (robj1 ... robj25) each with one version, then 11,111 versions of vobj1, then 20 objects (zobj1 ... zobj20). And I can list it just fine.

Here's how the omap keys fall onto the various shards:

.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.0 : 4
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.10 : 20
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.1 : 24
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.2 : 16
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.3 : 8
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.4 : 28
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.5 : 24
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.6 : 8
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.7 : 16
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.8 : 24
.dir.ceaab072-92e1-40cb-8036-64da4776cbf1.4137.1.9 : 22232

Joshua, can you tell me how to modify my test case to trigger the issue? Or can you come up with a minimal reproducer?

Thanks!

Actions

Copy link

Updated by J. Eric Ivancich 10 months ago

Status changed from In Progress to Need More Info

Actions

Copy link

Updated by J. Eric Ivancich 10 months ago

This is a build of v16.2.13 and here's the result of:

    $ s3cmd ls s3://bkt1

2023-08-04 15:59 4 s3://bkt1/robj1
2023-08-04 15:59 4 s3://bkt1/robj10
2023-08-04 15:59 4 s3://bkt1/robj11
2023-08-04 15:59 4 s3://bkt1/robj12
2023-08-04 15:59 4 s3://bkt1/robj13
2023-08-04 15:59 4 s3://bkt1/robj14
2023-08-04 15:59 4 s3://bkt1/robj15
2023-08-04 15:59 4 s3://bkt1/robj16
2023-08-04 15:59 4 s3://bkt1/robj17
2023-08-04 15:59 4 s3://bkt1/robj18
2023-08-04 15:59 4 s3://bkt1/robj19
2023-08-04 15:59 4 s3://bkt1/robj2
2023-08-04 15:59 4 s3://bkt1/robj20
2023-08-04 15:59 4 s3://bkt1/robj21
2023-08-04 15:59 4 s3://bkt1/robj22
2023-08-04 15:59 4 s3://bkt1/robj23
2023-08-04 15:59 4 s3://bkt1/robj24
2023-08-04 15:59 4 s3://bkt1/robj25
2023-08-04 15:59 4 s3://bkt1/robj3
2023-08-04 15:59 4 s3://bkt1/robj4
2023-08-04 15:59 4 s3://bkt1/robj5
2023-08-04 15:59 4 s3://bkt1/robj6
2023-08-04 15:59 4 s3://bkt1/robj7
2023-08-04 15:59 4 s3://bkt1/robj8
2023-08-04 15:59 4 s3://bkt1/robj9
2023-08-04 15:59 4 s3://bkt1/vobj1
2023-08-04 20:15 492 s3://bkt1/zobj1
2023-08-04 20:15 492 s3://bkt1/zobj10
2023-08-04 20:15 492 s3://bkt1/zobj11
2023-08-04 20:15 492 s3://bkt1/zobj12
2023-08-04 20:15 492 s3://bkt1/zobj13
2023-08-04 20:15 492 s3://bkt1/zobj14
2023-08-04 20:15 492 s3://bkt1/zobj15
2023-08-04 20:15 492 s3://bkt1/zobj16
2023-08-04 20:15 492 s3://bkt1/zobj17
2023-08-04 20:15 492 s3://bkt1/zobj18
2023-08-04 20:15 492 s3://bkt1/zobj19
2023-08-04 20:15 492 s3://bkt1/zobj2
2023-08-04 20:15 492 s3://bkt1/zobj20
2023-08-04 20:15 492 s3://bkt1/zobj3
2023-08-04 20:15 492 s3://bkt1/zobj4
2023-08-04 20:15 492 s3://bkt1/zobj5
2023-08-04 20:15 492 s3://bkt1/zobj6
2023-08-04 20:15 492 s3://bkt1/zobj7
2023-08-04 20:15 492 s3://bkt1/zobj8
2023-08-04 20:15 492 s3://bkt1/zobj9

Actions

Copy link

Updated by Joshua Baergen 10 months ago

File 0022026979.objects.gz 0022026979.objects.gz added

Hi Eric,

As I alluded to in my previous comment, I think there's something deeper going on with the bucket state that's violating an assumption in the bucket listing code, but I'm not confident enough in my knowledge of versioned buckets to know for sure.

I've discovered that the bucket in question is a versioning-suspended bucket. The object in question has one version from when the bucket was versioned (and thus has an instance ID), and all remaining 10k+ copies of the object have no instance ID and thus were probably Put after versioning had been turned off. Per my previous post, I think this disrupts continuation between the RGW and CLS sides of bucket listing, since it depends on the instance ID as a part of the continuation token. I've attached a lightly redacted listing of the object in question from 'bi list' for reference.

Now, if I correctly understand how this is supposed to work, the above situation should not arise, and there should only ever be one copy of the object in a version-suspended bucket that lacks the instance ID. If that's the case, then the issue isn't in the listing code per se, but rather that some bug exists that causes more than one copy of this object to exist in version-suspended mode.

Hopefully the additional information helps, and let me know if I correctly understand how version-suspended buckets are supposed to work!

Actions

Copy link

#10

Updated by J. Eric Ivancich 9 months ago

Yeah, that could be the issue. My colleague Casey Bodley said he just fixed a bug that could create that situation. See:

https://tracker.ceph.com/issues/62013
https://github.com/ceph/ceph/pull/52684 .

You may want to check with your customer whether they think they could've hit that.

If your theory is correct, that may require a manual clean-up on your end. And you'd want to change the status of this tracker to "Rejected".

Actions

Copy link

#11

Updated by Joshua Baergen 9 months ago

Thanks, I'll need to spend some time understanding that tracker/fix to see if it addresses the condition that led to this bucket state, and also to understand what should be done to clean things up.

Actions

Copy link

#12

Updated by Joshua Baergen 9 months ago

Hi Eric,

After looking through that tracker, I don't think the customer followed that sequence, but it's of course hard to know for certain. Based on the attached "bi list" contents, what looks like happened is that the customer wrote a version of the object in question on 2021-11-29 when the bucket had versioning enabled, and then later overwrote the object a number of times after versioning had been disabled on the bucket but while that original copy was still there. Also, the code path being modified in the linked PR appears to be the case where the olh key is not found, whereas in this case there appears to be an olh key with an epoch that matches the most recently written index entry, and presumably that would have been the case for other instances too.

So in short it sounds related but I don't think it was the cause of what happened here; I'm still learning how this path works, of course, so I'm happy to be overruled if the information I've provided actually does match the changed code path.

Actions

Copy link

#13

Updated by Casey Bodley 9 months ago

Status changed from Need More Info to Can't reproduce

moving status to Cant Reproduce for now. please update if a supported ceph version can create these duplicate keys. once the keys exist, i don't think rgw_bucket_list can avoid getting stuck this way

Actions

Copy link

#14

Updated by Joshua Baergen 9 months ago

I built a script this week that inspects a bucket's shards, finds excess null-instance entries, and removes them. After applying this to the bucket in question here, ListObjects returned to normal operation on the bucket without further intervention.

Per conversation in Slack, we're going to leave this tracker alone now unless we can establish that the same condition is encountered on Pacific.

Actions

Copy link

#15

Updated by Casey Bodley 4 months ago

Related to Bug #64203: RGW S3: list bucket results in a 500 Error when object-lock is enabled added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #62256

rgw: Bucket listing hangs against object with 11111 versions

Updated by J. Eric Ivancich 10 months ago

Updated by Joshua Baergen 10 months ago

Updated by Joshua Baergen 10 months ago

Updated by Joshua Baergen 10 months ago

Updated by J. Eric Ivancich 10 months ago

Updated by J. Eric Ivancich 10 months ago

Updated by J. Eric Ivancich 10 months ago

Updated by J. Eric Ivancich 10 months ago

Updated by Joshua Baergen 10 months ago

Updated by J. Eric Ivancich 9 months ago

Updated by Joshua Baergen 9 months ago

Updated by Joshua Baergen 9 months ago

Updated by Casey Bodley 9 months ago

Updated by Joshua Baergen 9 months ago

Updated by Casey Bodley 4 months ago