Bug #65436
openGetting Object Crashing radosgw services
0%
Description
Hello,
We are seeing crashes when users are trying to get a specific file.
2024-04-11T11:02:38.174+0000 7f6bbf6ee700 20 req 1201706618685104296 0.000000000s s3:get_obj get_obj_state: rctx=0x7f6fb47f6ac0 obj=<bucketname>:.cache/<112 characters>.jpg/<33 characters> state=0x7f698418d728 s->prefetch_data=1 2024-04-11T11:02:38.174+0000 7f6bbf6ee700 20 req 1201706618685104296 0.000000000s s3:get_obj get_obj_state: rctx=0x7f6fb47f6ac0 obj=<bucketname>:.cache/<112 characters>.jpg/<33 characters> state=0x7f698418d728 s->prefetch_data=1 2024-04-11T11:02:38.174+0000 7f6bbf6ee700 20 req 1201706618685104296 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.idtag 2024-04-11T11:02:38.174+0000 7f6bbf6ee700 20 req 1201706618685104296 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.manifest 2024-04-11T11:02:38.174+0000 7f6bbf6ee700 20 req 1201706618685104296 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.olh.idtag 2024-04-11T11:02:38.174+0000 7f6bbf6ee700 20 req 1201706618685104296 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.olh.info 2024-04-11T11:02:38.174+0000 7f6bbf6ee700 20 req 1201706618685104296 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.olh.ver 2024-04-11T11:02:38.178+0000 7f6bbf6ee700 -1 *** Caught signal (Aborted) **
This is reproducible by on this specific object:
$ s3cmd -c s3 get s3://<bucket>/.cache/<112 characters>.jpg/<33 characters> download: 's3://<bucket>/.cache/<112 characters>.jpg/<33 characters>' -> './<33 characters>' [1 of 1] ERROR: Error parsing xml: mismatched tag: line 6, column 2 ERROR: b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n' ERROR: Download of './<33 characters>' failed (Reason: 502 (Bad Gateway)) ERROR: S3 error: 502 (Bad Gateway)
We are running:
rgw on 17.2.5
rest is 17.2.7
on Debian 11
Files
Updated by hoan nv about 1 month ago
I have same issue. After some days, i found bug https://tracker.ceph.com/issues/61359
After upgrade to 17.2.7, this bug gone. But i should delete error file, i can't fix this file.
Updated by Casey Bodley 29 days ago
- Status changed from New to Need More Info
After upgrade to 17.2.7, this bug gone
it sounds like this bug is fixed in later point release, can you please try to upgrade? we can't do anything to fix 17.2.5 specifically
Updated by Reid Guyett 15 days ago
Hello,
I was able to test in 17.2.7 and the rgw service is still crashing with the same error message.
2024-05-02T17:26:25.256+0000 7f399f7be700 20 req 6086159647010032067 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.idtag 2024-05-02T17:26:25.256+0000 7f399f7be700 20 req 6086159647010032067 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.manifest 2024-05-02T17:26:25.256+0000 7f399f7be700 20 req 6086159647010032067 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.olh.idtag 2024-05-02T17:26:25.256+0000 7f399f7be700 20 req 6086159647010032067 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.olh.info 2024-05-02T17:26:25.256+0000 7f399f7be700 20 req 6086159647010032067 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.olh.ver 2024-05-02T17:26:25.260+0000 7f399f7be700 -1 *** Caught signal (Aborted) ** in thread 7f399f7be700 thread_name:radosgw ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f3adad98140] 2: gsignal() 3: abort() 4: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9a7ec) [0x7f3adac527ec] 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5966) [0x7f3adac5d966] 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa4a49) [0x7f3adac5ca49] 7: __gxx_personality_v0() 8: /lib/x86_64-linux-gnu/libgcc_s.so.1(+0x1073f) [0x7f3adabae73f] 9: _Unwind_Resume() 10: /lib/libradosgw.so.2(+0x53cccf) [0x7f3adb2e3ccf] 11: /lib/libradosgw.so.2(+0x6388c6) [0x7f3adb3df8c6] 12: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0) [0x7f3adac86ed0] 13: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f3adad8cea7] 14: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by hoan nv 14 days ago
Did you try with error file on old bucket ?
Error file can't fix by upgrade ceph. You need delete error file or all file on bucket.
After upgrade, you can try create bucket, enable bucket versioning, upload file then disable bucket versioning and try upload file. You must do it a few times. If no file error it will ok.
Updated by Reid Guyett 14 days ago
So the solution is to upgrade RGW, delete and recreate the bucket?
Since we do not own or control the data being uploaded by customers, I don't think it is really feasible. The RGW should return an HTTP error to the client instead of crashing the whole service.
Updated by Reid Guyett 14 days ago
We are also blocked by https://tracker.ceph.com/issues/64308 in moving to 17.2.7.
Updated by Reid Guyett 11 days ago
What did you do to fix it at the proxy layer? Strip the parameters from the URL?
Updated by Reid Guyett about 12 hours ago
I was able to reproduce this error on 17.2.7.
Using [s3-tests](https://github.com/ceph/s3-tests/) test_versioning_obj_suspended_copy, I am able to reproduce the RGW crashing each time.
S3TEST_CONF=s3tests-new.conf tox -- s3tests_boto3/functional/test_s3.py::test_versioning_obj_suspended_copy
<...>
FAILED s3tests_boto3/functional/test_s3.py::test_versioning_obj_suspended_copy - botocore.exceptions.ClientError: An error occurred (502) when calling the GetObject operation (reached max retries: 4): Bad Gateway
ERROR s3tests_boto3/functional/test_s3.py::test_versioning_obj_suspended_copy - botocore.exceptions.ClientError: An error occurred (502) when calling the ListBuckets operation (reached max retries: 4): Bad Gateway
It has the same error in the RGW logs when crashing.
terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of_buffer' what(): End of buffer *** Caught signal (Aborted) ** in thread 7f1550070700 thread_name:radosgw ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)