vstart/s3cmd reproducer:
~/ceph/build $ s3cmd put 128m.iso s3://testbucket
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 1 of 9, 15MB] [1 of 1]
15728640 of 15728640 100% in 3s 4.89 MB/s done
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 2 of 9, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 157.13 MB/s done
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 3 of 9, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 153.28 MB/s done
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 4 of 9, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 157.17 MB/s done
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 5 of 9, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 162.56 MB/s done
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 6 of 9, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 157.83 MB/s done
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 7 of 9, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 151.82 MB/s done
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 8 of 9, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 162.57 MB/s done
upload: '128m.iso' -> 's3://testbucket/128m.iso' [part 9 of 9, 8MB] [1 of 1]
8388608 of 8388608 100% in 0s 113.51 MB/s done
~/ceph/build $ bin/rados -p default.rgw.buckets.data ls
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.1_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.8_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.6
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.3_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.4
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.9
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.2_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.7_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.1_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.1_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.7_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.8_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.7
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.3_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.7_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.4_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.6_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.9_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.8_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.6_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.2_2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.3_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.6_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.2_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.8
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.4_1
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1_128m.iso
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.4_3
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.2
a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.1
~/ceph/build $ bin/rados -p default.rgw.buckets.data rm a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_1
~/ceph/build $ s3cmd get s3://testbucket/128m.iso 128m.iso.out
download: 's3://testbucket/128m.iso' -> '128m.iso.out' [1 of 1]
65536 of 134217728 0% in 0s 4.34 MB/s^CSee ya!
log output:
2024-01-04T14:04:36.472-0500 7f15fda446c0 1 -- 192.168.245.130:0/2349082191 <== osd.0 v2:192.168.245.130:6800/356959160 1103 ==== osd_op_reply(1101 a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__multipart_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5 [read 0~4194304 out=4194304b] v0'0 uv25 ondisk = 0) v8 ==== 243+0+4194304 (crc 0 0 0) 0x9df2480 con 0x9e0e480
2024-01-04T14:04:36.472-0500 7f15fda446c0 1 -- 192.168.245.130:0/2349082191 <== osd.0 v2:192.168.245.130:6800/356959160 1104 ==== osd_op_reply(1102 a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_1 [read 0~4194304] v0'0 uv0 ondisk = 2 ((2) No such file or directory)) v8 ==== 242+0+0 (crc 0 0 0) 0x9df2480 con 0x9e0e480
2024-01-04T14:04:36.472-0500 7f155b9006c0 20 req 13909855352182413071 0.141004950s s3:get_obj RGWObjManifest::operator++(): rule>part_size=15728640 rules.size()=2
2024-01-04T14:04:36.472-0500 7f155b9006c0 20 req 13909855352182413071 0.141004950s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=71303168 part_ofs=62914560 rule->part_size=15728640
2024-01-04T14:04:36.472-0500 7f155b9006c0 20 req 13909855352182413071 0.141004950s s3:get_obj RGWObjManifest::operator++(): result: ofs=71303168 stripe_ofs=71303168 part_ofs=62914560 rule->part_size=15728640
2024-01-04T14:04:36.472-0500 7f155b9006c0 20 req 13909855352182413071 0.141004950s s3:get_obj rados->get_obj_iterate_cb oid=a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_2 obj-ofs=71303168 read_ofs=0 len=4194304
2024-01-04T14:04:36.473-0500 7f155b9006c0 1 -- 192.168.245.130:0/2349082191 --> [v2:192.168.245.130:6800/356959160,v1:192.168.245.130:6801/356959160] -- osd_op(unknown.0.0:1103 7.0 7:681c2da5:::a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_2:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e21) v8 -- 0xd08ae00 con 0x9e0e480
2024-01-04T14:04:36.473-0500 7f155b9006c0 0 req 13909855352182413071 0.142004982s s3:get_obj iterate_obj() failed with 2
2024-01-04T14:04:36.481-0500 7f15fda446c0 1 - 192.168.245.130:0/2349082191 <== osd.0 v2:192.168.245.130:6800/356959160 1105 ==== osd_op_reply(1103 a5b83a43-b6de-41a4-8ef7-0f95b8868adb.4137.1__shadow_128m.iso.2~Ymp4J5DoatIHj7V-Ii0I1yzqG5bxozj.5_2 [read 0~4194304 out=4194304b] v0'0 uv23 ondisk = 0) v8 ==== 242+0+4194304 (crc 0 0 0) 0x9df2480 con 0x9e0e480
2024-01-04T14:04:36.481-0500 7f15550f36c0 2 req 13909855352182413071 0.150005266s s3:get_obj completing
2024-01-04T14:04:36.481-0500 7f15550f36c0 10 req 13909855352182413071 0.150005266s cache get: name=default.rgw.log++script.postrequest. : hit (negative entry)
2024-01-04T14:04:36.481-0500 7f15550f36c0 2 req 13909855352182413071 0.150005266s s3:get_obj op status=-2
2024-01-04T14:04:36.481-0500 7f15550f36c0 2 req 13909855352182413071 0.150005266s s3:get_obj http status=200
2024-01-04T14:04:36.481-0500 7f15550f36c0 1 ====== req done req=0x7f14d6ff7458 op status=-2 http_status=200 latency=0.150005266s ======
2024-01-04T14:04:36.482-0500 7f15550f36c0 1 beast: 0x7f14d6ff7458: ::1 - testid [04/Jan/2024:14:04:36.331 -0500] "GET /testbucket/128m.iso HTTP/1.1" 200 59768832 - - - latency=0.150005266s
the problem is that we've already returned response headers with 200 OK
but can't finish writing the response body. even though the op fails with op status=-2
, that error doesn't get communicated back to the frontend so it assumes the op succeeded and tries to read the next request from the connection. as a result, the client keeps waiting for data that the server won't ever send
the frontend has a way to track 'fatal' errors that terminate the connection (see https://github.com/ceph/ceph/blob/d45189857/src/rgw/rgw_asio_frontend.cc#L313-L315), but those only track errors reading from/writing to the client socket. in this case, the errors are coming from rados
we've discussed this issue in the past in the context of https://tracker.ceph.com/issues/46770 and there was an incomplete attempt to fix in https://github.com/ceph/ceph/pull/36428. i tend to think that RGWOps should throw exceptions in these cases so it's easy for the frontend to distinguish these fatal errors from normal request errors, but a lot of the RGWOp code isn't exception safe (for example, some Op::execute() functions rely on gotos). we might just extend the ClientIO interface with a close_connection()
function that RGWGetObj could call in this case