Bug #45955
closed[rgw] listing bucket via s3 hangs on "ordered bucket listing requires read #1"
0%
Description
I dont know whether it is related to #42358 but Ive seen something similar but on bucket that just have small files (< 1MB) and only around 21k objects
doing s3cmd ls s3://fcp/
generates
2020-06-09T12:49:16.511+0200 7f015b7a9700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.543+0200 7f015a7a7700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.579+0200 7f015afa8700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.611+0200 7f015b7a9700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.643+0200 7f015a7a7700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.675+0200 7f015afa8700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.707+0200 7f015b7a9700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.739+0200 7f015a7a7700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.771+0200 7f015afa8700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.803+0200 7f015b7a9700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1 2020-06-09T12:49:16.835+0200 7f015a7a7700 1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1
in logs (for a loong time,never completes)
it gets weirder tho, doing request to subdirectory does just fine
$ s3cmd ls s3://fcp/94/ DIR s3://fcp/94/02/ DIR s3://fcp/94/03/ DIR s3://fcp/94/07/ DIR s3://fcp/94/08/ ... $ s3cmd ls s3://fcp/94/02/239402/ 2020-06-09 09:34 31350 s3://fcp/94/02/239402/6480.0.jpg 2020-06-09 09:34 87325 s3://fcp/94/02/239402/6480.1.jpg
we've had pretty much same file layout on the old (v12.x) pre-production cluster
now doing same via radosgw-admin
is nice and snappy:
# time radosgw-admin bi list --bucket=fcp |grep idx |wc -l 21090 real 0m0.779s user 0m0.305s sys 0m0.023s
Files
Updated by Mariusz Gronczewski almost 4 years ago
to make matters weirder, doing
s3cmd ls s3://fcp --recursive
works just fine, altho takes few seconds
Updated by Mariusz Gronczewski almost 4 years ago
Tried same on 14.2.9 and it was fine there
Updated by Casey Bodley almost 4 years ago
- Status changed from New to Triaged
- Assignee set to J. Eric Ivancich
Updated by J. Eric Ivancich almost 4 years ago
Mariusz Gronczewski wrote:
to make matters weirder, doing
s3cmd ls s3://fcp --recursive
works just fine, altho takes few seconds
Mariusz, would it be possible to get the output of `radosgw-admin bi list --bucket=fcp`, perhaps as an attachment?
Updated by Mariusz Gronczewski almost 4 years ago
- File fcp-bi-list.00.gz fcp-bi-list.00.gz added
- File fcp-bi-list.01.gz fcp-bi-list.01.gz added
Attached. Split in two because of attachment limit.
By the way is there a Debian master tree packages somewhere? I wanted to test whether latest master have same issue but there seem to be only Ubuntu and CentOS packages for it.
Updated by Mariusz Gronczewski almost 4 years ago
As weird as it sounds, setting debug_rgw to 20/1 circumvents the issue.
See the mailing list thread: https://www.spinics.net/lists/ceph-users/msg60575.html
Updated by Mariusz Gronczewski almost 4 years ago
okay, so I've played a bit with it and having 0 logging causes it to hang indefintely, having 0/20 debug_rgw makes listing works and also generates
2020-06-30T11:17:43.961+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=629392/<FF>, cur_marker= 2020-06-30T11:17:44.017+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=637831/<FF>, cur_marker= 2020-06-30T11:17:44.069+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=644842/<FF>, cur_marker= 2020-06-30T11:17:44.129+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=651703/<FF>, cur_marker= 2020-06-30T11:17:44.169+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=656672/<FF>, cur_marker= 2020-06-30T11:17:44.261+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=662085/<FF>, cur_marker= 2020-06-30T11:17:44.317+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=667811/<FF>, cur_marker= 2020-06-30T11:17:44.373+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=675772/<FF>, cur_marker= 2020-06-30T11:17:44.413+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=681836/<FF>, cur_marker= 2020-06-30T11:17:44.493+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=685907/<FF>, cur_marker= 2020-06-30T11:17:44.569+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=690465/<FF>, cur_marker= 2020-06-30T11:17:44.677+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=696271/<FF>, cur_marker= 2020-06-30T11:17:44.729+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=700478/<FF>, cur_marker= 2020-06-30T11:17:44.781+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=705298/<FF>, cur_marker= 2020-06-30T11:17:44.841+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=710555/<FF>, cur_marker= 2020-06-30T11:17:44.893+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=714847/<FF>, cur_marker= 2020-06-30T11:17:44.949+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=720771/<FF>, cur_marker= 2020-06-30T11:17:44.993+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=725645/<FF>, cur_marker= 2020-06-30T11:17:45.045+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=731369/<FF>, cur_marker= 2020-06-30T11:17:45.097+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=736434/<FF>, cur_marker= 2020-06-30T11:17:45.157+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=740813/<FF>, cur_marker= 2020-06-30T11:17:45.209+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=744332/<FF>, cur_marker= 2020-06-30T11:17:45.253+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=748302/<FF>, cur_marker= 2020-06-30T11:17:45.317+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=753421/<FF>, cur_marker= 2020-06-30T11:17:45.377+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=757974/<FF>, cur_marker= 2020-06-30T11:17:45.425+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=762490/<FF>, cur_marker= 2020-06-30T11:17:45.469+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=767510/<FF>, cur_marker= 2020-06-30T11:17:45.533+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=773006/<FF>, cur_marker= 2020-06-30T11:17:45.577+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=777352/<FF>, cur_marker= 2020-06-30T11:17:45.633+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=782776/<FF>, cur_marker= 2020-06-30T11:17:45.701+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=787989/<FF>, cur_marker= 2020-06-30T11:17:45.753+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=791862/<FF>, cur_marker= 2020-06-30T11:17:45.837+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=798073/<FF>, cur_marker= 2020-06-30T11:17:45.925+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=804396/<FF>, cur_marker= 2020-06-30T11:17:46.017+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=809073/<FF>, cur_marker= 2020-06-30T11:17:46.089+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=813250/<FF>, cur_marker= 2020-06-30T11:17:46.217+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=823829/<FF>, cur_marker= 2020-06-30T11:17:46.325+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=832606/<FF>, cur_marker= 2020-06-30T11:17:46.433+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=842132/<FF>, cur_marker= 2020-06-30T11:17:46.497+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=93743/<FF>, cur_marker= 2020-06-30T11:17:46.545+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=99083/<FF>, cur_marker= 2020-06-30T11:17:46.589+0200 7fe50565e700 0 RGWRados::Bucket::List::list_objects_ordered: ERROR marker failed to make forward progress; attempt=2, prev_marker=letsplay/<FF>, cur_marker=
in the logs
Updated by J. Eric Ivancich almost 4 years ago
- Status changed from Triaged to In Progress
I'm actively looking into this and wanted to provide a quick update. I took the provided bucket index and used it to recreate the fcp directory on a test instance running 15.2.3.
And I was able to reproduce the issue -- the listing hangs at normal logging but succeeds when rgw logging is set to 20.
One concern I initially had was that when I grep for "idx" in the bi list and run it through `wc -l` I get 21520 where above it's show as 21090. But given that I've reproduced, it's not particularly important to figure out why that discrepancy exists.
I'll post additional updates to this tracker.
Updated by J. Eric Ivancich almost 4 years ago
Good news. I found the core problem and an additional inefficiency.
The core problem was directly related to your observation that the issue was not present when logging was up to 20. It has to do with how logging levels are handled and the use of side-effects. That fix is simple. I'll submit a PR quickly for that.
There's also an inefficiency, though, that when the result of one call to rgw contains only common prefixes, the marker is not handled as well as it could be. That just means that, for example, s3cmd gets fewer results back with each call and therefore requires more calls. I'll work on a fix for that as well.
Thanks for your insight and patience!
Updated by Mariusz Gronczewski almost 4 years ago
One concern I initially had was that when I grep for "idx" in the bi list and run it through `wc -l` I get 21520 where above it's show as 21090. But given that I've reproduced, it's not particularly important to figure out why that discrepancy exists.
the list of files I've attached matches that number:
zcat /tmp/fcp-bi-list.00.gz /tmp/fcp-bi-list.01.gz |grep idx |sort |wc -l 21520
So everything's fine, the discrepancy in my examples was because bucket was currently in use.
Updated by J. Eric Ivancich almost 4 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 35877
Updated by Dan van der Ster almost 4 years ago
Does this need a nautilus backport too? the list_objects_ordered change just landed in 14.2.10
Updated by J. Eric Ivancich almost 4 years ago
Dan van der Ster wrote:
Does this need a nautilus backport too? the list_objects_ordered change just landed in 14.2.10
Yes. I created a separate tracker for it -- https://tracker.ceph.com/issues/46326.
Updated by Nathan Cutler almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to octopus, nautilus
- Pull request ID changed from 35877 to 34037
converting into the master tracker that was originally judged not necessary
opening backport trackers for the octopus and nautilus cherry-picks
this will enable automated closure of the backport trackers
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46340: octopus: [rgw] listing bucket via s3 hangs on "ordered bucket listing requires read #1" added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46326: nautilus: [rgw] listing bucket via s3 hangs on "ordered bucket listing requires read #1" added
Updated by J. Eric Ivancich almost 4 years ago
Nathan Cutler wrote:
converting into the master tracker that was originally judged not necessary
opening backport trackers for the octopus and nautilus cherry-picks
this will enable automated closure of the backport trackers
Thanks, Nathan.
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Updated by J. Eric Ivancich over 3 years ago
Above I'd said: "There's also an inefficiency, though, that when the result of one call to rgw contains only common prefixes, the marker is not handled as well as it could be. That just means that, for example, s3cmd gets fewer results back with each call and therefore requires more calls. I'll work on a fix for that as well."
After examining it, the inefficiency was already resolved in master and octopus 15.2.4. It also does not apply to nautilus and earlier since the relevant code changed significantly between nautilus and octopus. So we are truly done with this tracker.