Actions
Bug #48709
open[RGW] [boto] PUT on versioned bucket fails with NoSuchKey
% Done:
0%
Source:
Tags:
backport_processed
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Description of problem: 'NoSuchKey' error observed on PUT operation on a versioned bucket while running a boto script. The boto script creates a versioned bucket with 5 versions of the object, lists and deletes the bucket in a loop. Version-Release number of selected component (if applicable): ceph version 14.2.11-89.el8cp How reproducible: 3/3 Steps to Reproduce: 1. ceph cluster on 4.2 with 3 rgw nodes 2. script gc_testing_ver_bkt.py is run from rgw node [extensa018] 3. The script gc_testing_ver_bkt.py does the following a. creates a bucket [ kvm_gc_ver_bkt_rhcs_1(num_of_iteration)] b. enables versionsing on the bucket c. creates 20 objects [ object size- 5M] d. creates 5 versions for each object e. lists the objects f. deletes the versioned objects g. deletes the bucket h creates another bucket [kvm_gc_ver_bkt_rhcs_2] steps a-g are repeated P.S: The script repeats the above 100000 times. 4. The script failed with NoSuchKey on the 23th iteration for bucket kvm_gc_ver_bkt_rhcs_23 snippet: Traceback (most recent call last): File "gc_testing_ver_bkt.py", line 29, in <module> key.set_contents_from_filename('classV') File "/usr/local/lib/python3.6/site-packages/boto/s3/key.py", line 1378, in set_contents_from_filename encrypt_key=encrypt_key) File "/usr/local/lib/python3.6/site-packages/boto/s3/key.py", line 1309, in set_contents_from_file chunked_transfer=chunked_transfer, size=size) File "/usr/local/lib/python3.6/site-packages/boto/s3/key.py", line 762, in send_file chunked_transfer=chunked_transfer, size=size) File "/usr/local/lib/python3.6/site-packages/boto/s3/key.py", line 963, in _send_file_internal query_args=query_args File "/usr/local/lib/python3.6/site-packages/boto/s3/connection.py", line 671, in make_request retry_handler=retry_handler File "/usr/local/lib/python3.6/site-packages/boto/connection.py", line 1071, in make_request retry_handler=retry_handler) File "/usr/local/lib/python3.6/site-packages/boto/connection.py", line 940, in _mexe request.body, request.headers) File "/usr/local/lib/python3.6/site-packages/boto/s3/key.py", line 896, in sender response.status, response.reason, body) boto.exception.S3ResponseError: S3ResponseError: 404 Not Found <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchKey</Code><BucketName>kvm_gc_ver_bkt_rhcs_23</BucketName><RequestId>tx00000000000000000132a-005fcf69f3-b20f-default</RequestId><HostId>b20f-default-default</HostId></Error> [root@extensa018 kvm]# 5. Observed the following in the logs [ rgw node magna055] 2020-12-08 11:56:41.391 7f2a99e6d700 5 bs.init() returned ret=-2 2020-12-08 11:56:41.391 7f2a99e6d700 20 update_olh() target_obj=kvm_gc_ver_bkt_rhcs_23:_:LVPsHPu6-2rx2.De6.qU5.3xFf47Kh4_dairy10 returned -2 2020-12-08 11:56:41.391 7f2a99e6d700 20 get_system_obj_state: rctx=0x7f2b2fef3958 obj=default.rgw.log:pubsub.user.kvm-gc.bucket.kvm_gc_ver_bkt_rhcs_23/73425f4f-9160-4820-8908-1119bafce85e.45589.24 state=0x55f115ba59a0 s->prefetch_data=0 2020-12-08 11:56:41.391 7f2a99e6d700 10 cache get: name=default.rgw.log++pubsub.user.kvm-gc.bucket.kvm_gc_ver_bkt_rhcs_23/73425f4f-9160-4820-8908-1119bafce85e.45589.24 : hit (negative entry) 2020-12-08 11:56:41.457 7f2a99e6d700 2 req 4906 5.489s s3:put_obj completing 2020-12-08 11:56:41.457 7f2a99e6d700 2 req 4906 5.489s s3:put_obj op status=-2 2020-12-08 11:56:41.457 7f2a99e6d700 2 req 4906 5.489s s3:put_obj http status=404 2020-12-08 11:56:41.457 7f2a99e6d700 1 ====== req done req=0x7f2b2fef7680 op status=-2 http_status=404 latency=5.48898s ====== 2020-12-08 11:56:41.457 7f2a99e6d700 1 beast: 0x7f2b2fef7680: 10.8.130.218 - - [2020-12-08 11:56:41.0.457772s] "PUT /kvm_gc_ver_bkt_rhcs_23/dairy10 HTTP/1.1" 404 10485989 - "Boto/2.49.0 Python/3.6.8 Linux/4.18.0-240.1.1.el8_3.x86_64" - 6. ceph configuration parameters on the setup rgw_lc_debug_interval = 600 rgw gc obj min wait = 10 rgw_lc_max_worker = 10 rgw_max_objs_per_shard = 5 Actual results: boto script fails with boto.exception.S3ResponseError: S3ResponseError: 404 Not Found <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchKey</Code><BucketName>kvm_gc_ver_bkt_rhcs_23</BucketName><RequestId>tx00000000000000000132a-005fcf69f3-b20f-default</RequestId><HostId>b20f-default-default</HostId></Error> Expected results: script should not fail.
Updated by Mark Kogan over 3 years ago
Additional data collection will follow as discussed...
Updated by Mark Kogan about 2 years ago
with the following upstream fixes:
1. https://github.com/ceph/ceph/pull/45345 -- cls/rgw: rgw_dir_suggest_changes detects race with completion
2. https://github.com/ceph/ceph/pull/45300 -- rgw: Update "CEPH_RGW_DIR_SUGGEST_LOG_OP" for remove entries
this BZ no longer reproduces on master.
Updated by Mark Kogan about 2 years ago
- Status changed from In Progress to Pending Backport
Actions