Project

General

Profile

Actions

Bug #51767

closed

missing CommonPrefixes with some shard count

Added by JS Landry almost 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-ansible
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi, I got this problems for several days and I can't find a solution.

I had a bucket with 11 shards and everything was ok. I reshard it to 23 and the the customer quickly reply that "two directories are now empty".
Trying to fix the situation, I did many reshards on this bucket and the highest I could go now is 10, otherwise 2 prefix return empty listing.

I test the bucket with postman, and what I found is that when adding a delimiter to the url: https://s3.example.com/bucketname/?prefix=carto/DATA/site/Works/vq/&delimiter=/
I don't received the same xml if the shard count is 10 or 23. (xml files attached)

the GET, when the bucket have 10 shards, return an xml that include all the "<CommonPrefixes>" tags for every "sub-directory" in /vq/,
but using the same GET when the bucket have 23 shards, return an xml without those "<CommonPrefixes>" tags.

That's the reason why the directory looks empty, but how can the xml be different? Nothing has changed except the sharding.

When using a prefix only url: https://s3.example.com/bucketname/?prefix=carto/DATA/site/Works/vq/
using 10 or 23 shards, the xml returned are identical.

We are running octopus 15.2.13, default shard number is 11, some buckets have 23, 29, 41, 60 shards.
No problems reported by the users, except for this case only. To my knowledge it's the only bucket having this problems.

Everything is looking fine when testing with radosgw-admin.
bi list, bucket list, bucket radoslist, or even the rados listomapskeys, with 10 or 23 shards, the lists are identical.

for this bucket,
shard of 10 (or less), everything is ok
shard of 11, 12, 13: only "carto/DATA/site/Works/vq/" is empty listing.
shard of 17 (or more): both "carto/DATA/site/Works/vq/" AND "carto/DATA/site/Works/" are empty.

The listing is empty but when you know the object path, you can get the object without any errors.
I did set debug_rgw 5/5, but I can't find anything there. (partial logfiles attached)
I would greatly appreciated some help with this.
Cheers!


Files

rgw-shard0.log (6.66 KB) rgw-shard0.log rgw/beast log when testing with postman, 0 shard JS Landry, 07/21/2021 03:04 PM
rgw-shard23.log (9.28 KB) rgw-shard23.log rgw/beast log when testing with postman, 23 shards JS Landry, 07/21/2021 03:04 PM
postman-get-prefix-delimiter-shard23-pp.xml (651 Bytes) postman-get-prefix-delimiter-shard23-pp.xml postman xml output for a get using prefix and delimiter on the 23 shards bucket JS Landry, 07/21/2021 03:04 PM
postman-get-prefix-delimiter-shard10-pp.xml (1.97 KB) postman-get-prefix-delimiter-shard10-pp.xml postman xml output for a get using prefix and delimiter on the 10 shards bucket JS Landry, 07/21/2021 03:04 PM
ceph-rgw-ul-stk-pr-ccr01.rgw0.log.level20.13shards.anon.gz (8.71 KB) ceph-rgw-ul-stk-pr-ccr01.rgw0.log.level20.13shards.anon.gz no CommonPrefixes for carto/DATA/ulaval/Rasters/vq/ only JS Landry, 08/02/2021 03:45 PM
ceph-rgw-ul-stk-pr-ccr01.rgw0.log.level20.17shards.anon.gz (11.4 KB) ceph-rgw-ul-stk-pr-ccr01.rgw0.log.level20.17shards.anon.gz no CommonPrefixes for carto/DATA/ulaval/Rasters/ and carto/DATA/ulaval/Rasters/vq/ JS Landry, 08/02/2021 03:45 PM
ceph-rgw-ul-stk-pr-ccr01.rgw0.log.level20.10shards.anon.gz (8.83 KB) ceph-rgw-ul-stk-pr-ccr01.rgw0.log.level20.10shards.anon.gz everything is ok JS Landry, 08/02/2021 03:45 PM
radosgw-list_object-and-cls_bucket_list-logfile.log.anon.gz (83.5 KB) radosgw-list_object-and-cls_bucket_list-logfile.log.anon.gz JS Landry, 08/24/2021 01:52 PM
Actions #1

Updated by Matt Benjamin almost 3 years ago

  • Assignee set to J. Eric Ivancich
Actions #2

Updated by J. Eric Ivancich over 2 years ago

  • Status changed from New to In Progress
Actions #3

Updated by J. Eric Ivancich over 2 years ago

  • Pull request ID set to 42552
Actions #4

Updated by J. Eric Ivancich over 2 years ago

Hello, JS Landry, I'd really like to help you resolve this for your customers. I'd really like to be able to reproduce this so I can work on it, and what would really help is a `radosgw-admin bi list --bucket=XYZ --max-entries=9999999`. We need max-entries to be greater than the number of actual entries in order to capture every one.

It's not lost on me that this is a BIG ask. I realize there's a possibility (probability?) that the object names may leak proprietary information. But I want to ask as it would really help speed up the process.

Additionally you provided some rgw logs. Would it be possible to get them at level 20? Please let me know when you see this, so I might know what to expect and when.

Thanks!!

Actions #5

Updated by JS Landry over 2 years ago

Hi, sure, I will provide you that in a few hours tonight. Thanks!

Actions #6

Updated by JS Landry over 2 years ago

Hi, I have the files you asked for and I'm waiting for the owner agreement. It shouldn't take long / next week.
Thanks!

Actions #7

Updated by J. Eric Ivancich over 2 years ago

JS Landry wrote:

Hi, I have the files you asked for and I'm waiting for the owner agreement. It shouldn't take long / next week.
Thanks!

Thanks!

Updated by JS Landry over 2 years ago

Hi Eric, here's the files. I did anonymize some data about the bucket name, owner, server name, ip address, otherwise everything is "as-is".

The bi-list file is about 2GB, ~76MB bzipped, I can't attach it here, the attachment size limit is 1MB. Where should I send it?

Thanks!

Actions #9

Updated by J. Eric Ivancich over 2 years ago

JS Landry wrote:

Hi Eric, here's the files. I did anonymize some data about the bucket name, owner, server name, ip address, otherwise everything is "as-is".

The bi-list file is about 2GB, ~76MB bzipped, I can't attach it here, the attachment size limit is 1MB. Where should I send it?

Thanks!

Thank you!! I'll take a look.

Actions #10

Updated by Casey Bodley over 2 years ago

  • Pull request ID deleted (42552)
Actions #11

Updated by JS Landry over 2 years ago

Eric, please note that I will be out of the office starting Monday, August 9, through August 23. Tell me if you need the bi-list file and where I can send it.
Thanks!

Actions #12

Updated by J. Eric Ivancich over 2 years ago

JS Landry,

Sorry my previous response did't catch that you weren't able to upload the bi list given the limitations here on the tracker.

Do you have an online storage account (e.g., Google Drive) that you can upload it to and give me read permissions?

Eric

Actions #13

Updated by J. Eric Ivancich over 2 years ago

JS Landry wrote:

Hi Eric, here's the files. I did anonymize some data about the bucket name, owner, server name, ip address, otherwise everything is "as-is".

The bi-list file is about 2GB, ~76MB bzipped, I can't attach it here, the attachment size limit is 1MB. Where should I send it?

Thanks!

I examined at the logs. I was looking for log lines that contain "cls_bucket_list_ordered" and "list_objects_ordered", but I couldn't find any. So somehow those logs don't capture the listing operation. They would be helpful, although the bi list would be the most helpful as I can then recreate such a bucket and perform various tests.

Thanks!!

Actions #14

Updated by J. Eric Ivancich over 2 years ago

JS Landry wrote:

Eric, please note that I will be out of the office starting Monday, August 9, through August 23. Tell me if you need the bi-list file and where I can send it.
Thanks!

If you don't have another option, a colleague recommended https://wormhole.app/ as a way to get me the data. It will expire after 24 hours.

Actions #15

Updated by JS Landry over 2 years ago

Hi, you can download the bi-list file here: https://f001.backblazeb2.com/file/9455b1e05809/bi.list.10shards.txt.anon.bz2

About the "list_objects_ordered" and "cls_bucket_list_ordered", strange enough I have about 8000 events in the central syslog server for July 30 between 9h48 and 9h50,
a day after the logfiles I already sent you. Anyhow, here's an export from the syslog server of all radosgw* events for this short period.

Tell me if you need anything else.
Thanks!

Actions #16

Updated by J. Eric Ivancich over 2 years ago

JS Landry wrote:

Hi, you can download the bi-list file here: https://f001.backblazeb2.com/file/9455b1e05809/bi.list.10shards.txt.anon.bz2

About the "list_objects_ordered" and "cls_bucket_list_ordered", strange enough I have about 8000 events in the central syslog server for July 30 between 9h48 and 9h50,
a day after the logfiles I already sent you. Anyhow, here's an export from the syslog server of all radosgw* events for this short period.

Tell me if you need anything else.

I grabbed the two files. Thank you! I'll keep you posted.

Thanks!

Actions #17

Updated by J. Eric Ivancich over 2 years ago

I wanted to provide an update. I've been trying to recreate the bucket (via the index) in order to reproduce. Ran into some hiccups on the machine due to limited disk space. It's currently running, although with 1.5 million entries to add it might take a day or two.

Actions #18

Updated by JS Landry over 2 years ago

Excellent! Thanks for the followup.

Actions #19

Updated by J. Eric Ivancich over 2 years ago

JS Landry wrote:

Excellent! Thanks for the followup.

I have a bucket loaded with the ids from your bi list -- bi.list.10shards.txt.anon. And I wasn't able to produce listings and then I realized that there's no mention of "carto/DATA/site/Works/vq" in that bi list. What am I missing?

Do you have example problematic bucket listings based on the bucket index entries listed in bi.list.10shards.txt.anon?

Actions #20

Updated by JS Landry over 2 years ago

J. Eric Ivancich wrote:

I have a bucket loaded with the ids from your bi list -- bi.list.10shards.txt.anon. And I wasn't able to produce listings and then I realized that there's no mention of "carto/DATA/site/Works/vq" in that bi list. What am I missing?

Hi, sorry about that, it's related to the anonymizing task. The path is "carto/DATA/ulaval/Rasters/vq/"

Actions #21

Updated by J. Eric Ivancich over 2 years ago

JS Landry wrote:

J. Eric Ivancich wrote:

I have a bucket loaded with the ids from your bi list -- bi.list.10shards.txt.anon. And I wasn't able to produce listings and then I realized that there's no mention of "carto/DATA/site/Works/vq" in that bi list. What am I missing?

Hi, sorry about that, it's related to the anonymizing task. The path is "carto/DATA/ulaval/Rasters/vq/"

Thanks for the info. So when I tested it with a recent master with 1, 10, 11, 13, and 17 shards, the listings were consistent. That could mean one of two things a) it's been fixed between 15.2.13 and master or b) the anonymizing step altered something crucial to exposing the bug. I'm now loading up a v15.2.13 test cluster with the 1,780,433 bucket index entries and I'll re-test. I will keep you posted.

Actions #22

Updated by J. Eric Ivancich over 2 years ago

As I try to load up the bucket index, my osd keeps crashing. For example:

-1256> 2021-09-21T20:37:53.900-0400 7f8c51a82640 10 osd.0 18 dequeue_op 0x5621d4fea340 finish
-1255> 2021-09-21T20:37:53.900-0400 7f8c51a82640 20 osd.0 op_wq(4) _process empty q, waiting
-1254> 2021-09-21T20:37:53.900-0400 7f8c55a8a640 20 osd.0 op_wq(4) _process empty q, waiting
-1253> 2021-09-21T20:37:53.902-0400 7f8c712c1640 -1 ** Caught signal (Segmentation fault) *
in thread 7f8c712c1640 thread_name:msgr-worker-2

I'm going to switch from v15.2.13 to v15.2.14 and see if that resolves anything. I'll also look to see if that's a known bug.

Actions #23

Updated by J. Eric Ivancich over 2 years ago

The osd crash I'm experiencing is reported here: https://tracker.ceph.com/issues/51527

It also happens on v15.2.14. I'm going to try to add automation to restart my osd when it crashes in order to load up the bucket index.

Actions #24

Updated by JS Landry over 2 years ago

Hi, I will try to find the time next week to add & test a v15.2.14 radosgw with my v15.2.13 cluster.
Is it possible, or a bad idea, to run/test a Pacific radosgw on a Octopus cluster?
Thanks!

Actions #25

Updated by J. Eric Ivancich over 2 years ago

JS Landry wrote:

Hi, I will try to find the time next week to add & test a v15.2.14 radosgw with my v15.2.13 cluster.
Is it possible, or a bad idea, to run/test a Pacific radosgw on a Octopus cluster?
Thanks!

It's not considered supported. But as long as it's only used for reading, it should not create an issue. Once you test it with your bucket listing, it would probably be best to take it down to prevent it from causing issues.

[I'm at around 700K of 1.8M entries loaded. I hope to be able to test it Monday.]

Actions #26

Updated by J. Eric Ivancich over 2 years ago

So I was finally able to reproduce on v15.2.13. However it does not reproduce on v15.2.14.

So I'd be curious, JS Landry, to know if you experience the bug with v15.2.14?

It looks like there are two PRs that affected bucket listing. On master they are: Their octopus backports are:

Let e know what you find with v15.2.14 and we'll go from there. Thanks!

Actions #27

Updated by JS Landry over 2 years ago

J. Eric Ivancich wrote:

So I was finally able to reproduce on v15.2.13. However it does not reproduce on v15.2.14.
So I'd be curious, JS Landry, to know if you experience the bug with v15.2.14?

Hi! That's great news. I'm working on it today and I let you know.
Thanks!

Actions #28

Updated by JS Landry over 2 years ago

Indeed, 15.2.14 fix it!
I should have test it earlier. I hit the bug before the release of 15.2.14 and I didn't check all the elements of the latest changelog, sorry about that.
Anyhow, thank you, everything is working as expected now.
We can close this ticket as fixed.

Actions #29

Updated by J. Eric Ivancich over 2 years ago

  • Status changed from In Progress to Closed

JS Landry wrote:

Indeed, 15.2.14 fix it!
I should have test it earlier. I hit the bug before the release of 15.2.14 and I didn't check all the elements of the latest changelog, sorry about that.
Anyhow, thank you, everything is working as expected now.
We can close this ticket as fixed.

You're welcome. And thank you for the update.

Actions

Also available in: Atom PDF