Project

General

Profile

Actions

Bug #23207

closed

rgw: inefficient buffer usage for PUTs

Added by Marcus Watts about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

At least in jewel using swift, radosgw is very inefficient about its buffering process. In RGWPutObj_ObjStore::get_data(), it allocates a buffer usually of size rgw_max_chunk_size, issues a single "mg_read()" call (which might fill about 1% of the buffer), then passes the result along. Eventually a list of buffers is condensed and given to write, but meanwhile there is a lot of wasted memory. With a chunk size of 1mb and a workload that wrote and deleted 262 objects using 64 parallel threads where no object was > 33m (avg size 7m), I was finding a peak allocation of 3.7g of ram, of which 99% was never being used.

I have an experimental patch that fixes this behavior. With that patch, using the same workload, I am now usually filling 1m buffers, the buffers do not need to be repacked to write them, and peak allocation went down to 138m of ram.


Related issues 3 (0 open3 closed)

Related to rgw - Bug #23596: mg_read() call has wrong argumentsResolved04/08/2018

Actions
Copied to rgw - Backport #23347: luminous: rgw: inefficient buffer usage for PUTsResolvedPrashant DActions
Copied to rgw - Backport #23348: jewel: rgw: inefficient buffer usage for PUTsResolvedNathan CutlerActions
Actions #1

Updated by Vikhyat Umrao about 6 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Marcus Watts about 6 years ago

I updated my PR. I moved the fill loop down from just the put-object code path to just above mg_read(). I don't believe there was logic to handle short-reads anywhere else, and we've gotten reports of hard to reproduce signature verification failures that could be explained by short reads, if so that should be fixed with this updated patch.

It's likely main | luminous have the same problem but I have not yet looked to see if that's the case.

Actions #3

Updated by Matt Benjamin about 6 years ago

great work, thanks marcus!

Matt

Actions #4

Updated by Marcus Watts about 6 years ago

I checked a copy of master. It does not have the problem. I'll need to do more checking; this is a mutant master linked against openssl 1.1.

Actions #5

Updated by Marcus Watts about 6 years ago

Matt suggested that it might be best to apply this fix to master - that way it doesn't matter what behavior a particular version of civetweb has; our code will work with it. So I've put together,
https://github.com/ceph/ceph/pull/20724

Actions #6

Updated by Nathan Cutler about 6 years ago

  • Backport set to luminous, jewel
Actions #7

Updated by Yehuda Sadeh about 6 years ago

  • Status changed from In Progress to Fix Under Review
Actions #8

Updated by Casey Bodley about 6 years ago

  • Subject changed from rgw: (jewel) inefficient buffer usage for PUTs to rgw: inefficient buffer usage for PUTs
  • Status changed from Fix Under Review to Pending Backport
Actions #9

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23347: luminous: rgw: inefficient buffer usage for PUTs added
Actions #10

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23348: jewel: rgw: inefficient buffer usage for PUTs added
Actions #11

Updated by Nathan Cutler about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions #12

Updated by Nathan Cutler about 6 years ago

  • Related to Bug #23596: mg_read() call has wrong arguments added
Actions

Also available in: Atom PDF