Project

General

Profile

Actions

Bug #63206

open

beast: S3 download stalls without useful logs upon encountering an invalid RADOS object

Added by Alexander Patrakov 7 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Dear developers,

A customer has a Ceph cluster recently updated to Quincy. The RADOS gateway was switched from civetweb to beast in the process.

They started complaining that the download of certain S3 objects started stalling. In other words, the transfer downloads, let's say, 8 MB out of 19 MB, and then the download speed drops to zero. There is nothing bad in the logs, even with `--debug-rgw=20`. Upon inspection, it was found that the problematic S3 objects had at least one of the underlying RADOS objects corrupted. The mode of corruption is that the length of the corrupted RADOS object (with the word "shadow" in its name) is zero instead of 4 megabytes. I don't know (and have no hope of eventually knowing) why these objects got corrupted, but this is not the subject of this bug. The subject is the RADOS gateway behavior under a pre-existing RADOS object corruption.

You can simulate this kind of pre-existing corruption on your own cluster this way, by overwriting one of the objects with an empty file:

touch empty-file
rados put -p default.rgw.buckets.data 'a30e6c45-9d9d-4c29-b7dc-43b0c8f9f36a.1704468.1__shadow_positron-veux-20230825.zip.2~EGs6uKCytGBgClNB1O9k57bO2BMZnIx.1_2' empty-file

The customer complaints at this time are:

  • Upon encountering the invalid (e.g. wrong-length) RADOS object, the RADOS gateway should log at least something, while it logs nothing;
  • The transfer stalls for at least 30 seconds instead of ending immediately with a premature EOF from the server;
  • There should be a radosgw-admin subcommand to check all S3 objects for this kind of problem.

I have also tested Ceph Pacific with the now-removed "civetweb" module and found that civetweb, upon encountering a corrupted RADOS object, closes the connection immediately - i.e., the second complaint does not apply there.


Related issues 1 (1 open0 closed)

Related to rgw - Bug #46770: rgw incorrect http status on RADOS i/o errorIn ProgressOr Friedmann

Actions
Actions #1

Updated by Casey Bodley 7 months ago

  • Assignee set to Shilpa MJ
Actions #2

Updated by Casey Bodley 4 months ago

vstart/s3cmd reproducer:

~/ceph/build $ s3cmd put 128m.iso s3://testbucket
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 1 of 9, 15MB] [1 of 1]
 15728640 of 15728640   100% in    3s     4.89 MB/s  done
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 2 of 9, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s   157.13 MB/s  done
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 3 of 9, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s   153.28 MB/s  done
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 4 of 9, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s   157.17 MB/s  done
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 5 of 9, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s   162.56 MB/s  done
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 6 of 9, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s   157.83 MB/s  done
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 7 of 9, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s   151.82 MB/s  done
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 8 of 9, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s   162.57 MB/s  done
upload: '128m.iso' -> 's3://testbucket/128m.iso'  [part 9 of 9, 8MB] [1 of 1]
 8388608 of 8388608   100% in    0s   113.51 MB/s  done

~/ceph/build $ bin/rados -p default.rgw.buckets.data ls
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.1_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.8_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.6
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.3_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.4
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.9
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.2_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.7_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.1_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.1_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.7_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.8_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.7
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.3_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.7_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.4_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.6_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.9_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.8_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.6_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.2_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.3_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.6_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.2_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.8
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.4_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1_128m.iso
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.4_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.1

~/ceph/build $ bin/rados -p default.rgw.buckets.data rm a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_1

~/ceph/build $ s3cmd get s3://testbucket/128m.iso 128m.iso.out
download: 's3://testbucket/128m.iso' -> '128m.iso.out'  [1 of 1]
     65536 of 134217728     0% in    0s     4.34 MB/s^CSee ya!

log output:

2024-01-04T14:04:36.472-0500 7f15fda446c0 1 -- 192.168.245.130:0/2349082191 <== osd.0 v2:192.168.245.130:6800/356959160 1103 ==== osd_op_reply(1101 a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5 [read 0~4194304 out=4194304b] v0'0 uv25 ondisk = 0) v8 ==== 243+0+4194304 (crc 0 0 0) 0x9df2480 con 0x9e0e480
2024-01-04T14:04:36.472-0500 7f15fda446c0 1 -- 192.168.245.130:0/2349082191 <== osd.0 v2:192.168.245.130:6800/356959160 1104 ==== osd_op_reply(1102 a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_1 [read 0~4194304] v0'0 uv0 ondisk = 2 ((2) No such file or directory)) v8 ==== 242+0+0 (crc 0 0 0) 0x9df2480 con 0x9e0e480
2024-01-04T14:04:36.472-0500 7f155b9006c0 20 req 13909855352182413071 0.141004950s s3:get_obj RGWObjManifest::operator++(): rule
>part_size=15728640 rules.size()=2
2024-01-04T14:04:36.472-0500 7f155b9006c0 20 req 13909855352182413071 0.141004950s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=71303168 part_ofs=62914560 rule->part_size=15728640
2024-01-04T14:04:36.472-0500 7f155b9006c0 20 req 13909855352182413071 0.141004950s s3:get_obj RGWObjManifest::operator++(): result: ofs=71303168 stripe_ofs=71303168 part_ofs=62914560 rule->part_size=15728640
2024-01-04T14:04:36.472-0500 7f155b9006c0 20 req 13909855352182413071 0.141004950s s3:get_obj rados->get_obj_iterate_cb oid=a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_2 obj-ofs=71303168 read_ofs=0 len=4194304
2024-01-04T14:04:36.473-0500 7f155b9006c0 1 -- 192.168.245.130:0/2349082191 --> [v2:192.168.245.130:6800/356959160,v1:192.168.245.130:6801/356959160] -- osd_op(unknown.0.0:1103 7.0 7:681c2da5:::a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_2:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e21) v8 -- 0xd08ae00 con 0x9e0e480
2024-01-04T14:04:36.473-0500 7f155b9006c0 0 req 13909855352182413071 0.142004982s s3:get_obj iterate_obj() failed with 2
2024-01-04T14:04:36.481-0500 7f15fda446c0 1 -
192.168.245.130:0/2349082191 <== osd.0 v2:192.168.245.130:6800/356959160 1105 ==== osd_op_reply(1103 a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_2 [read 0~4194304 out=4194304b] v0'0 uv23 ondisk = 0) v8 ==== 242+0+4194304 (crc 0 0 0) 0x9df2480 con 0x9e0e480
2024-01-04T14:04:36.481-0500 7f15550f36c0 2 req 13909855352182413071 0.150005266s s3:get_obj completing
2024-01-04T14:04:36.481-0500 7f15550f36c0 10 req 13909855352182413071 0.150005266s cache get: name=default.rgw.log++script.postrequest. : hit (negative entry)
2024-01-04T14:04:36.481-0500 7f15550f36c0 2 req 13909855352182413071 0.150005266s s3:get_obj op status=-2
2024-01-04T14:04:36.481-0500 7f15550f36c0 2 req 13909855352182413071 0.150005266s s3:get_obj http status=200
2024-01-04T14:04:36.481-0500 7f15550f36c0 1 ====== req done req=0x7f14d6ff7458 op status=-2 http_status=200 latency=0.150005266s ======
2024-01-04T14:04:36.482-0500 7f15550f36c0 1 beast: 0x7f14d6ff7458: ::1 - testid [04/Jan/2024:14:04:36.331 -0500] "GET /testbucket/128m.iso HTTP/1.1" 200 59768832 - - - latency=0.150005266s

the problem is that we've already returned response headers with 200 OK but can't finish writing the response body. even though the op fails with op status=-2, that error doesn't get communicated back to the frontend so it assumes the op succeeded and tries to read the next request from the connection. as a result, the client keeps waiting for data that the server won't ever send

the frontend has a way to track 'fatal' errors that terminate the connection (see https://github.com/ceph/ceph/blob/d45189857/src/rgw/rgw_asio_frontend.cc#L313-L315), but those only track errors reading from/writing to the client socket. in this case, the errors are coming from rados

we've discussed this issue in the past in the context of https://tracker.ceph.com/issues/46770 and there was an incomplete attempt to fix in https://github.com/ceph/ceph/pull/36428. i tend to think that RGWOps should throw exceptions in these cases so it's easy for the frontend to distinguish these fatal errors from normal request errors, but a lot of the RGWOp code isn't exception safe (for example, some Op::execute() functions rely on gotos). we might just extend the ClientIO interface with a close_connection() function that RGWGetObj could call in this case

Actions #3

Updated by Casey Bodley 4 months ago

  • Related to Bug #46770: rgw incorrect http status on RADOS i/o error added
Actions

Also available in: Atom PDF