compression ratio depends on block size, which is much smaller (16K vs 4M) in multisite sync
Compressors will add a block header to each buffer that we pass to compress(), meaning that the overall compression ratio depends on the block size of its input. Rgw also stores a user.rgw.compression attribute with the object, which is an array of these blocks for mapping virtual offsets to compressed offsets.
In RGWPutObj, these buffers are rgw_obj_stripe_size=4M by default, which results in good compression. In multisite sync, we compress buffers as they come in from libcurl, which defaults to 16k blocks - this results in a significant size overhead - both in the actual compression ratio, and the size of the compression_block array stored in the user.rgw.compression attribute.
#1 Updated by Casey Bodley 10 months ago
I was unable to increase libcurl's buffer size above 16k. This issue was raised at https://github.com/curl/curl/issues/2372.
It sounds like we'll need some internal buffering in the compression filter to batch these into 4M blocks. Given a bufferlist with lots of 16k buffers, the compressors will still add a header for each 16k buffer - so we may need to make the compressors smarter about this as well.