Project

General

Profile

Feature #174

Support large files better

Added by Greg Farnum almost 14 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Right now, the rados gateway just dumps a given file into the RADOS store as a single large dump. If somebody's storing an HD movie or something, this can easily fill up memory on the OSD. Support writing objects out in pieces for large objects, and look into how rgw handles HTTP continue messages (which must be used for large files).

Screenshot-View_HTTP_Request_and_Response_Header_-_Mozilla_Firefox.png View (131 KB) Wido den Hollander, 07/20/2010 02:08 AM

History

#1 Updated by Yehuda Sadeh over 13 years ago

The the gateway should now (following commit:925e2092486bbc78f011065172524d6c550ae7c6) do its get and put operations by chunks, so hopefully the gateway side problem is fixed. Still doing all operations on single object (read: not striping).

#2 Updated by Wido den Hollander over 13 years ago

The commit from last night seems to have broken the Content-Length header, see the attached screenshot.

There is always a length of 4194304 returned for files larger then 4MB. Just tested with a file of 399 bytes, that works fine (right length is returned).

#3 Updated by Wido den Hollander over 13 years ago

Removing the following lines in rgw_rados.cc the content-length is returned correctly:

Line 621:*

if (len > RGW_MAX_CHUNK_SIZE)
    len = RGW_MAX_CHUNK_SIZE;

But i assume these lines are there for a reason.

Removing them works fine, i just downloaded a file of 150MB and it's md5sum matches the original file.

#4 Updated by Yehuda Sadeh over 13 years ago

commit:f3eb96457b193b1f5d79cf2b41a3cda690c0eab0 fixes the content length issue.

#5 Updated by Wido den Hollander over 13 years ago

After that commit the Content-Length works fine.

Uploading large files still fails, for example:

110534 bytes remaining (99% complete) ...
94150 bytes remaining (99% complete) ...
77766 bytes remaining (99% complete) ...
61382 bytes remaining (99% complete) ...
44998 bytes remaining (99% complete) ...
28614 bytes remaining (99% complete) ...
12230 bytes remaining (99% complete) ...
********** STALLS FOR ABOUT 20 SECONDS ****************
ERROR: XmlParseFailure

In my Apache log i see:

x-amz-date:Wed, 21 Jul 2010 08:17:25 GMT
/flashforward/FlashForward.0101.No.More.Good.Days.mkv
hmac=6a9850e1b9188f180931f75d4e7a9ab691fca123
b64=aphQ4bkYjxgJMfddTnqatpH8oSM=
auth_sign=aphQ4bkYjxgJMfddTnqatpH8oSM=
compare=0
<AccessControlPolicy><Owner><ID>XIJ1WBL0DKXFQEQG0REV</ID><DisplayName>Wido</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>XIJ1WBL0DKXFQEQG0REV</ID><DisplayName>Wido</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group"><URI>http://acs.amazonaws.com/groups/global/AllUsers</URI></Grantee><Permission>READ</Permission></Grant></AccessControlList></AccessControlPolicy>searching permissions for uid=XIJ1WBL0DKXFQEQG0REV mask=2
found permission: 15
[Wed Jul 21 10:21:05 2010] [warn] [client 2a00:f10:103:1::2810:1337] (70007)The timeout specified has expired: mod_fcgid: can't get data from http client

My fcgid settings are:

FcgidMaxProcesses 16
FcgidMaxRequestLen 2147483648
FcgidIOTimeout 180

The file i'm trying to upload is 1.1GB

The original file is 1173221318 bytes, right now my file shows up in a bucket listing, but with a filesize of 1065353216 bytes and no ETag.

It seems somewhere the gateway is responding to slow and Apache kills the connection.

There are no "laggy" messages in my logs.

#6 Updated by Wido den Hollander over 13 years ago

I think i was a bit to early with reporting. A few minutes later the file showed up with the correct filesize and a ETag in the bucketlisting, but still, there is not a proper response to the client.

It seems that radosgw is still writing data to the RADOS network and Apache kills the connection since it has been quiet too long.

I'm uploading with about 5MB/sec, imho the gateway should be able to keep up with this.

#7 Updated by Yehuda Sadeh over 13 years ago

Probably the ETag calculation is not being done right. Should be done in chunks too.

#8 Updated by Yehuda Sadeh over 13 years ago

Actually, the ETag calculation is being done in chunks and is ok. The real problem is that fcgid apache module first buffers the whole PUT request and only then does it go to the rados fcgi module. I tried it on lighttpd with fastcgi and had similar results.

#9 Updated by Yehuda Sadeh over 13 years ago

Replacing the apache fcgid module with fastcgi seems to solve the problem, as it doesn't buffer the entire uploaded data.

#10 Updated by Wido den Hollander over 13 years ago

I can confirm that, the FastCGI module works fine under Apache.

I uploaded a file of 1.1G and 4.4G (larger then RAM of the machine) and that worked half, the upload went fine (success for the client), but there was still a delay before the file was fully written.

In my logs i could see that the RADOS gw was still writing the file, so it was available, but not with all the content.

The ETag matched the md5sum of the file, so that was fine.

#11 Updated by Wido den Hollander over 13 years ago

Seems to be an issue with the reported Content-Length when downloading the file:

20:06 < wido> yehudasa: tried the 4.4GB upload again, size reported by the GW in a bucket listing is fine, the ETag also
              matches, but when downloading the file, a size of 391212054b is reported
20:11 < wido> ok, it is a overflow somewhere, 4686179350 is the total filesize, where 391212054 is the remainder of the
              filesize minus 4GB
20:11 < wido> ((1024 * 1024 * 1024) * 4) + 391212054 = 4686179350

#12 Updated by Yehuda Sadeh over 13 years ago

  • Status changed from New to Resolved

Closing this one, last bug was fixed with commit:d708a746ffd4d75d7502127d2c43d11105f1e484. Was able to upload and download 5GB files.

#13 Updated by Sage Weil about 13 years ago

  • Category set to radosgw

#14 Updated by Sage Weil about 13 years ago

  • Project changed from RADOS Gateway to Ceph
  • Category changed from radosgw to 22

#15 Updated by John Spray about 6 years ago

  • Project changed from Ceph to rgw
  • Category deleted (22)

Bulk reassign of radosgw category to RGW project.

Also available in: Atom PDF