Bug #16309
closedrgw: bucket listing hangs on versioned bucket
0%
Description
GET BUCKET (List Objects) S3 API hangs on a versioned bucket.
The API was terminated with 500 server error (FastCGI timeout), but RGW's internal listing process continued until the RGW was restarted.
RGW repeatedly tries to get information for a specific object that has null version and another version as bellow.
<?xml version="1.0" encoding="UTF-8"?> <ListVersionsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Name>hojo-bucket-02</Name> <Prefix>testfile1896.txt</Prefix> <KeyMarker></KeyMarker> <MaxKeys>1000</MaxKeys> <IsTruncated>false</IsTruncated> <Version> <Key>testfile1896.txt</Key> <VersionId>2MSeTpZoDSqt2nTk1oYHZNmF1e84C.1</VersionId> <IsLatest>true</IsLatest> <LastModified>2016-05-30T08:39:52.000Z</LastModified> <ETag>"1d24c7924b9798bb9064dcb043b3d989"</ETag> <Size>3152</Size> <StorageClass>STANDARD</StorageClass> <Owner> <ID>XXXXXXXX</ID> <DisplayName>XXXXXXXX</DisplayName> </Owner> </Version> <Version> <Key>testfile1896.txt</Key> <VersionId>null</VersionId> <IsLatest>false</IsLatest> <LastModified>2016-05-30T02:43:22.000Z</LastModified> <ETag>"20a4fc4c12598089a8937496a5eba67e"</ETag> <Size>3052</Size> <StorageClass>STANDARD</StorageClass> <Owner> <ID>XXXXXXXX</ID> <DisplayName>XXXXXXXX</DisplayName> </Owner> </Version> </ListVersionsResult>
<?xml version="1.0" encoding="UTF-8"?> <ListVersionsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Name>hojo-bucket-02</Name> <Prefix>testfile1897.txt</Prefix> <KeyMarker></KeyMarker> <MaxKeys>1000</MaxKeys> <IsTruncated>false</IsTruncated> <Version> <Key>testfile1897.txt</Key> <VersionId>bPMbvEae4KIxSJZn9folT4sQtt7h0w3</VersionId> <IsLatest>true</IsLatest> <LastModified>2016-05-30T08:39:52.000Z</LastModified> <ETag>"71d8c0b2fc4e320f7f82ce88b737f2dd"</ETag> <Size>3152</Size> <StorageClass>STANDARD</StorageClass> <Owner> <ID>XXXXXXXX</ID> <DisplayName>XXXXXXXX</DisplayName> </Owner> </Version> <Version> <Key>testfile1897.txt</Key> <VersionId>null</VersionId> <IsLatest>false</IsLatest> <LastModified>2016-05-30T02:43:22.000Z</LastModified> <ETag>"27fe674eeca8f3fcff844ad7e91816c9"</ETag> <Size>3052</Size> <StorageClass>STANDARD</StorageClass> <Owner> <ID>XXXXXXXX</ID> <DisplayName>XXXXXXXX</DisplayName> </Owner> </Version> </ListVersionsResult>
Files
Updated by Orit Wasserman almost 8 years ago
I have tried to reproduce in on 0.94.6 without any luck, can you give more details?
It is very easy to reproduce in on 0.94.5, can you confirm your version?
Updated by Osamu KIMURA almost 8 years ago
As you can find the last line in the radosgw-20160530.log.gz, which was output when the RGW was restarted, the version is 0.94.6.
2016-05-30 20:09:55.595131 7f490ea78820 0 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403), process radosgw, pid 1042
Unfortunately, I don't know the detailed procedure. I guess the customer doesn't remember details.
The system is configured with 3 RGWs and a load balancer. Some operations might be executed by other RGWs than this log.
- Create a bucket
- PUT Objects (testfile0.txt ... testfile2499.txt)
- GET Bucket (List objects) - no problem without versioning
- PUT Bucket versioning (Enable versioning)
- PUT Bucket versioning (??? I don't know why twice)
- GET Bucket versioning
- PUT Objects (testfile0.txt ... testfile2499.txt)
- GET Bucket (List objects) - no problem with no versioned objects
- PUT Objects (testfile0.txt ... testfile2499.txt)
- GET Bucket (List objects) - infinite loop on testfile1897.txt
- PUT Objects (test/testfile0.txt ... test/testfile2499.txt?)
- GET Bucket (List objects) - infinite loop on testfile1896.txt
- PUT Objects (test/testfile2000.txt ... test/testfile2499.txt?)
- GET Bucket (List objects) - infinite loop on testfile1896.txt
- GET Bucket (List objects) - infinite loop on testfile1896.txt
- GET Bucket (List objects) - infinite loop on testfile1896.txt
- GET Bucket (List objects) - infinite loop on test/testfile1898.txt
- GET Bucket (List objects) - infinite loop on test/testfile1898.txt
- GET Bucket (List objects) - infinite loop on testfile1896.txt
- GET Bucket (List objects) - infinite loop on testfile1896.txt
- HEAD Bucket
- GET Bucket (List objects) - infinite loop on testfile1896.txt
- Restart RGW
As I mentioned before, some other APIs might be executed on other RGWs.
I noted following outputs in the log.
2016-05-30 17:43:55.642866 7fcb13f9f700 10 cls_bucket_list hojo-bucket-02(@{i=.rgw.buckets.index,e=.rgw.buckets.extra}.rgw.buckets[default.1456852.44]) start testfile1298.txt^@v913^@i7A6PTnyp3iM4CM3PR6n4hPeIz7VAXnI[] num_entries 667 ... 2016-05-30 17:43:55.701421 7fcb13f9f700 10 cls_bucket_list hojo-bucket-02(@{i=.rgw.buckets.index,e=.rgw.buckets.extra}.rgw.buckets[default.1456852.44]) start testfile1897.txt^@v913^@ibPMbvEae4KIxSJZn9folT4sQtt7h0w3[] num_entries 2
Looped entries always have "num_entries" with 2 or 1. Non-looped entries have larger number.
Updated by Osamu KIMURA almost 8 years ago
I mistook. please eliminate 7th step.
Updated by Orit Wasserman almost 8 years ago
Still cannot reproduce in on 0.94.6
Can you provide more details about the costumer environment? what are the ceph packages installed?
Updated by Osamu KIMURA almost 8 years ago
I apologize for the wait.
I got the environment.
- CentOS release 6.7 (Final)
- kernel 2.6.32-573.18.1.el6.x86_64 #1 SMP Tue Feb 9 22:46:17 UTC 2016 x86_64 x86_64
- ceph-radosgw-0.94.6-0.el6.x86_64
- python-cephfs-0.94.6-0.el6.x86_64
- ceph-0.94.6-0.el6.x86_64
- ceph-common-0.94.6-0.el6.x86_64
- libcephfs1-0.94.6-0.el6.x86_64
- httpd-2.2.15-47.el6.centos.3.x86_64
- httpd-tools-2.2.15-47.el6.centos.3.x86_64
- httpd-devel-2.2.15-47.el6.centos.3.x86_64
They are using 3 RGW nodes under a load balancer. All the RGW nodes are same configuration.
Is it enough?
Updated by Orit Wasserman almost 8 years ago
can you run:
radosgw-admin bi list --bucket=hojo-bucket-02 --object=testfile1896.txt
Also can you increase the osd classobj debug level and provide the logs:
ceph tell osd.\* injectargs --debug-objclass 20
Updated by Osamu KIMURA almost 8 years ago
Here is your requested information:
radosgw-admin bi list --bucket=hojo-bucket-02 --object=testfile1896.txt
The bucket is only for test purpose, but the system has been generally in service.
It is difficult to set high debug level. In addition, it is difficult to re-try listing of the bucket, because the listing operation would continue until the RGW would be restarted. It affects operations on other buckets.
Updated by Yehuda Sadeh almost 8 years ago
What version are the osds running? Have osds been restarted since upgrade? E.g., please run:
$ ceph tell osd.\* version
Updated by Osamu KIMURA almost 8 years ago
OSDs are running on 0.94.3.3. RGWs are running on 0.94.6.
OSDs are built on our appliance. RGWs are built on the customer's server. Different versions may co-exist.
Updated by Orit Wasserman almost 8 years ago
Sadly the fix is in the OSD not in the gateway, this is why the user is encountering this issue.
Updated by Osamu KIMURA almost 8 years ago
Does it mean the fix of issue #13536 has to be applied to OSD?
Updated by Orit Wasserman almost 8 years ago
yes, it is radosgw code that runs in the OSD (object class)
Updated by Orit Wasserman almost 8 years ago
- Status changed from New to Duplicate
duplicated : http://tracker.ceph.com/issues/13536