Project

General

Profile

Actions

Bug #45955

closed

[rgw] listing bucket via s3 hangs on "ordered bucket listing requires read #1"

Added by Mariusz Gronczewski almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
octopus, nautilus
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I dont know whether it is related to #42358 but Ive seen something similar but on bucket that just have small files (< 1MB) and only around 21k objects

doing s3cmd ls s3://fcp/ generates

2020-06-09T12:49:16.511+0200 7f015b7a9700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.543+0200 7f015a7a7700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.579+0200 7f015afa8700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.611+0200 7f015b7a9700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.643+0200 7f015a7a7700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.675+0200 7f015afa8700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.707+0200 7f015b7a9700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.739+0200 7f015a7a7700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.771+0200 7f015afa8700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.803+0200 7f015b7a9700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
2020-06-09T12:49:16.835+0200 7f015a7a7700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1

in logs (for a loong time,never completes)

it gets weirder tho, doing request to subdirectory does just fine


$ s3cmd ls s3://fcp/94/

                       DIR   s3://fcp/94/02/
                       DIR   s3://fcp/94/03/
                       DIR   s3://fcp/94/07/
                       DIR   s3://fcp/94/08/
              ...   
$ s3cmd ls s3://fcp/94/02/239402/
2020-06-09 09:34     31350   s3://fcp/94/02/239402/6480.0.jpg
2020-06-09 09:34     87325   s3://fcp/94/02/239402/6480.1.jpg

we've had pretty much same file layout on the old (v12.x) pre-production cluster

now doing same via radosgw-admin is nice and snappy:

# time radosgw-admin bi list --bucket=fcp  |grep idx |wc -l
21090

real    0m0.779s
user    0m0.305s
sys    0m0.023s

Files

fcp-bi-list.00.gz (685 KB) fcp-bi-list.00.gz Mariusz Gronczewski, 06/11/2020 11:01 PM
fcp-bi-list.01.gz (685 KB) fcp-bi-list.01.gz Mariusz Gronczewski, 06/11/2020 11:01 PM

Related issues 2 (0 open2 closed)

Copied to rgw - Backport #46340: octopus: [rgw] listing bucket via s3 hangs on "ordered bucket listing requires read #1"ResolvedJ. Eric IvancichActions
Copied to rgw - Backport #46326: nautilus: [rgw] listing bucket via s3 hangs on "ordered bucket listing requires read #1"ResolvedJ. Eric IvancichActions
Actions #1

Updated by Mariusz Gronczewski almost 4 years ago

to make matters weirder, doing

s3cmd ls s3://fcp --recursive

works just fine, altho takes few seconds

Actions #2

Updated by Mariusz Gronczewski almost 4 years ago

Tried same on 14.2.9 and it was fine there

Actions #3

Updated by Casey Bodley almost 4 years ago

  • Status changed from New to Triaged
  • Assignee set to J. Eric Ivancich
Actions #4

Updated by J. Eric Ivancich almost 4 years ago

Mariusz Gronczewski wrote:

to make matters weirder, doing

s3cmd ls s3://fcp --recursive

works just fine, altho takes few seconds

Mariusz, would it be possible to get the output of `radosgw-admin bi list --bucket=fcp`, perhaps as an attachment?

Updated by Mariusz Gronczewski almost 4 years ago

Attached. Split in two because of attachment limit.

By the way is there a Debian master tree packages somewhere? I wanted to test whether latest master have same issue but there seem to be only Ubuntu and CentOS packages for it.

Actions #6

Updated by Mariusz Gronczewski almost 4 years ago

As weird as it sounds, setting debug_rgw to 20/1 circumvents the issue.

See the mailing list thread: https://www.spinics.net/lists/ceph-users/msg60575.html

Actions #7

Updated by Mariusz Gronczewski almost 4 years ago

okay, so I've played a bit with it and having 0 logging causes it to hang indefintely, having 0/20 debug_rgw makes listing works and also generates

2020-06-30T11:17:43.961+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=629392/<FF>, cur_marker=
2020-06-30T11:17:44.017+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=637831/<FF>, cur_marker=
2020-06-30T11:17:44.069+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=644842/<FF>, cur_marker=
2020-06-30T11:17:44.129+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=651703/<FF>, cur_marker=
2020-06-30T11:17:44.169+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=656672/<FF>, cur_marker=
2020-06-30T11:17:44.261+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=662085/<FF>, cur_marker=
2020-06-30T11:17:44.317+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=667811/<FF>, cur_marker=
2020-06-30T11:17:44.373+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=675772/<FF>, cur_marker=
2020-06-30T11:17:44.413+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=681836/<FF>, cur_marker=
2020-06-30T11:17:44.493+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=685907/<FF>, cur_marker=
2020-06-30T11:17:44.569+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=690465/<FF>, cur_marker=
2020-06-30T11:17:44.677+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=696271/<FF>, cur_marker=
2020-06-30T11:17:44.729+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=700478/<FF>, cur_marker=
2020-06-30T11:17:44.781+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=705298/<FF>, cur_marker=
2020-06-30T11:17:44.841+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=710555/<FF>, cur_marker=
2020-06-30T11:17:44.893+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=714847/<FF>, cur_marker=
2020-06-30T11:17:44.949+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=720771/<FF>, cur_marker=
2020-06-30T11:17:44.993+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=725645/<FF>, cur_marker=
2020-06-30T11:17:45.045+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=731369/<FF>, cur_marker=
2020-06-30T11:17:45.097+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=736434/<FF>, cur_marker=
2020-06-30T11:17:45.157+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=740813/<FF>, cur_marker=
2020-06-30T11:17:45.209+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=744332/<FF>, cur_marker=
2020-06-30T11:17:45.253+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=748302/<FF>, cur_marker=
2020-06-30T11:17:45.317+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=753421/<FF>, cur_marker=
2020-06-30T11:17:45.377+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=757974/<FF>, cur_marker=
2020-06-30T11:17:45.425+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=762490/<FF>, cur_marker=
2020-06-30T11:17:45.469+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=767510/<FF>, cur_marker=
2020-06-30T11:17:45.533+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=773006/<FF>, cur_marker=
2020-06-30T11:17:45.577+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=777352/<FF>, cur_marker=
2020-06-30T11:17:45.633+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=782776/<FF>, cur_marker=
2020-06-30T11:17:45.701+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=787989/<FF>, cur_marker=
2020-06-30T11:17:45.753+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=791862/<FF>, cur_marker=
2020-06-30T11:17:45.837+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=798073/<FF>, cur_marker=
2020-06-30T11:17:45.925+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=804396/<FF>, cur_marker=
2020-06-30T11:17:46.017+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=809073/<FF>, cur_marker=
2020-06-30T11:17:46.089+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=813250/<FF>, cur_marker=
2020-06-30T11:17:46.217+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=823829/<FF>, cur_marker=
2020-06-30T11:17:46.325+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=832606/<FF>, cur_marker=
2020-06-30T11:17:46.433+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=842132/<FF>, cur_marker=
2020-06-30T11:17:46.497+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=93743/<FF>, cur_marker=
2020-06-30T11:17:46.545+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=99083/<FF>, cur_marker=
2020-06-30T11:17:46.589+0200 7fe50565e700  0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=letsplay/<FF>, cur_marker=

in the logs

Actions #8

Updated by J. Eric Ivancich almost 4 years ago

  • Status changed from Triaged to In Progress

I'm actively looking into this and wanted to provide a quick update. I took the provided bucket index and used it to recreate the fcp directory on a test instance running 15.2.3.

And I was able to reproduce the issue -- the listing hangs at normal logging but succeeds when rgw logging is set to 20.

One concern I initially had was that when I grep for "idx" in the bi list and run it through `wc -l` I get 21520 where above it's show as 21090. But given that I've reproduced, it's not particularly important to figure out why that discrepancy exists.

I'll post additional updates to this tracker.

Actions #9

Updated by J. Eric Ivancich almost 4 years ago

Good news. I found the core problem and an additional inefficiency.

The core problem was directly related to your observation that the issue was not present when logging was up to 20. It has to do with how logging levels are handled and the use of side-effects. That fix is simple. I'll submit a PR quickly for that.

There's also an inefficiency, though, that when the result of one call to rgw contains only common prefixes, the marker is not handled as well as it could be. That just means that, for example, s3cmd gets fewer results back with each call and therefore requires more calls. I'll work on a fix for that as well.

Thanks for your insight and patience!

Actions #10

Updated by Mariusz Gronczewski almost 4 years ago

One concern I initially had was that when I grep for "idx" in the bi list and run it through `wc -l` I get 21520 where above it's show as 21090. But given that I've reproduced, it's not particularly important to figure out why that discrepancy exists.

the list of files I've attached matches that number:

zcat /tmp/fcp-bi-list.00.gz /tmp/fcp-bi-list.01.gz |grep idx |sort |wc -l
21520

So everything's fine, the discrepancy in my examples was because bucket was currently in use.

Actions #11

Updated by J. Eric Ivancich almost 4 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 35877
Actions #12

Updated by Dan van der Ster almost 4 years ago

Does this need a nautilus backport too? the list_objects_ordered change just landed in 14.2.10

Actions #13

Updated by J. Eric Ivancich almost 4 years ago

Dan van der Ster wrote:

Does this need a nautilus backport too? the list_objects_ordered change just landed in 14.2.10

Yes. I created a separate tracker for it -- https://tracker.ceph.com/issues/46326.

Actions #14

Updated by Nathan Cutler almost 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to octopus, nautilus
  • Pull request ID changed from 35877 to 34037

converting into the master tracker that was originally judged not necessary

opening backport trackers for the octopus and nautilus cherry-picks

this will enable automated closure of the backport trackers

Actions #15

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #46340: octopus: [rgw] listing bucket via s3 hangs on "ordered bucket listing requires read #1" added
Actions #16

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #46326: nautilus: [rgw] listing bucket via s3 hangs on "ordered bucket listing requires read #1" added
Actions #17

Updated by J. Eric Ivancich almost 4 years ago

Nathan Cutler wrote:

converting into the master tracker that was originally judged not necessary

opening backport trackers for the octopus and nautilus cherry-picks

this will enable automated closure of the backport trackers

Thanks, Nathan.

Actions #18

Updated by Nathan Cutler almost 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #19

Updated by J. Eric Ivancich over 3 years ago

Above I'd said: "There's also an inefficiency, though, that when the result of one call to rgw contains only common prefixes, the marker is not handled as well as it could be. That just means that, for example, s3cmd gets fewer results back with each call and therefore requires more calls. I'll work on a fix for that as well."

After examining it, the inefficiency was already resolved in master and octopus 15.2.4. It also does not apply to nautilus and earlier since the relevant code changed significantly between nautilus and octopus. So we are truly done with this tracker.

Actions

Also available in: Atom PDF