Bug #23547
compression ratio depends on block size, which is much smaller (16K vs 4M) in multisite sync
0%
Description
Compressors will add a block header to each buffer that we pass to compress(), meaning that the overall compression ratio depends on the block size of its input. Rgw also stores a user.rgw.compression attribute with the object, which is an array of these blocks for mapping virtual offsets to compressed offsets.
In RGWPutObj, these buffers are rgw_obj_stripe_size=4M by default, which results in good compression. In multisite sync, we compress buffers as they come in from libcurl, which defaults to 16k blocks - this results in a significant size overhead - both in the actual compression ratio, and the size of the compression_block array stored in the user.rgw.compression attribute.
Related issues
History
#1 Updated by Casey Bodley almost 6 years ago
I was unable to increase libcurl's buffer size above 16k. This issue was raised at https://github.com/curl/curl/issues/2372.
It sounds like we'll need some internal buffering in the compression filter to batch these into 4M blocks. Given a bufferlist with lots of 16k buffers, the compressors will still add a header for each 16k buffer - so we may need to make the compressors smarter about this as well.
#2 Updated by Casey Bodley almost 6 years ago
- Status changed from In Progress to Fix Under Review
#3 Updated by Casey Bodley almost 6 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from jewel luminous to luminous
#4 Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #23864: luminous: compression ratio depends on block size, which is much smaller (16K vs 4M) in multisite sync added
#5 Updated by Nathan Cutler almost 6 years ago
- Status changed from Pending Backport to Resolved