Feature #65131
openperf counters for CreateMultipartUpload, AbortMultipartUpload, CompleteMultipartUpload
0%
Description
The rgw daemon exposes a perf counter "rgw.puts" as a counter representing the number of put ops performed by the gateway.
On small objects the value is fine, but beyond 16MB the value appears to be counting in 16MB ops instead of the actual object size. For example, a workload of 64MB PUTs is 4x the value it should be when cross checked with the client.
(GETs counters are not effected by this issue)
To reproduce, use warp with something like
warp put --warp-client warp-1 --host rgw-1 --access-key $ACCESS_KEY --secret-key $SECRET_KEY --bucket $bucket --obj.size 64MB --concurrent 1 --duration 1m
And grab the counter stas via the admin socket
e.g.
#!/usr/bin/bash write_stats () { put_count=$(ceph daemon /var/run/ceph/ceph-client.rgw.group2.storage-13-09008.ujvjki.7.94246064814240.asok perf dump | jq ".rgw.put") now=$(date '+%s') echo "${now},${put_count}" } while true; do write_stats sleep 5 done
Here's my results;
From the rgw script
1711416974,2108690 1711416980,2108696 1711416985,2108727 1711416990,2108759 1711416995,2108790 1711417000,2108823 1711417005,2108858 1711417010,2108891 1711417015,2108923 1711417021,2108954 1711417026,2108985 1711417031,2109014 1711417036,2109044 1711417041,2109062 1711417046,2109062
The above shows a delta of 31-32 every 5s, so in theory the PUT rate is around 6 ops/sec
However, the results from warp show;
started workload, 1 client(s) with 64MB objects at Tue Mar 26 01:36:14 UTC 2024 warp: Benchmark data written to "put-experiment/ec8-2/client_count_1/clients_1_64MB_PUT.csv.zst" ---------------------------------------- Operation: PUT. Concurrency: 1 * Average: 94.18 MiB/s, 1.54 obj/s Throughput, split into 58 x 1s: * Fastest: 109.1MiB/s, 1.79 obj/s * 50% Median: 94.4MiB/s, 1.55 obj/s * Slowest: 84.2MiB/s, 1.38 obj/s warp: Cleanup done. workload completed, 1 client(s) with 64MB objects at Tue Mar 26 01:37:20 UTC 2024
If you account for the reporting unit in rgw being in 16MB ops not 64MB ops and divide by 4, the RGW stats would become 1.5 which is a match for the value reported by warp.
This affects any monitoring or grafana graphs that show put ops, and any that show PUT latency since latency calculations may rely on a count ops processed in an interval.
Files
Updated by Paul Cuzner about 2 months ago
- File put-test.py put-test.py added
Also tried python's boto3 with a 64MB objects and saw a similar thing where the put count from rgw is way higher than the actual value seen at the client.
(test program attached)
Here's the output from the program
time, ops/interval 1711501061,21 1711501066,22 1711501071,21 1711501076,21 1711501081,20 1711501087,20 1711501092,20 1711501097,21 1711501102,20 1711501107,20 1711501112,20 1711501117,20 1711501122,21 1711501127,20 1711501132,20
So every 5 secs the client says around 20 puts have been completed - i.e. approx 4 put ops/sec
And the output from the rgw dumpstats script shows
time, ops_counter 1711501062,2119534 1711501068,2119710 1711501073,2119870 1711501078,2120038 1711501083,2120198 1711501088,2120358 1711501093,2120526 1711501098,2120694 1711501103,2120854 1711501109,2121014 1711501114,2121174 1711501119,2121342 1711501124,2121502 1711501129,2121662 1711501134,2121806
The same interval of 5 secs was used. The delta between samples from rgw shows an increase of around 160 ops, so per sec that shows around 32 put ops/sec
No errors at the client, or at the radosgw.
What I do see at the rgw end is the request being broken up
debug 2024-03-27T00:58:53.678+0000 7fe563b31700 1 ====== starting new request req=0x7fe4e09a9730 ===== debug 2024-03-27T00:58:53.678+0000 7fe516a97700 1 ====== starting new request req=0x7fe4e0928730 ===== debug 2024-03-27T00:58:53.682+0000 7fe546af7700 1 ====== starting new request req=0x7fe4e06a3730 ===== debug 2024-03-27T00:58:53.682+0000 7fe52dac5700 1 ====== starting new request req=0x7fe4e0a2a730 ===== debug 2024-03-27T00:58:53.682+0000 7fe510a8b700 1 ====== starting new request req=0x7fe4e07a5730 ===== debug 2024-03-27T00:58:53.682+0000 7fe5282ba700 1 ====== starting new request req=0x7fe4e0826730 ===== debug 2024-03-27T00:58:53.682+0000 7fe596396700 1 ====== starting new request req=0x7fe4e08a7730 ===== debug 2024-03-27T00:58:53.682+0000 7fe55831a700 1 ====== starting new request req=0x7fe4e0724730 ===== debug 2024-03-27T00:58:53.826+0000 7fe574352700 1 ====== req done req=0x7fe4e09a9730 op status=0 http_status=200 latency=0.148004770s ====== debug 2024-03-27T00:58:53.826+0000 7fe574352700 1 beast: 0x7fe4e09a9730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.678 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=1 HTTP/1.1" 200 8388608 - "Boto3/ 1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.148004770s debug 2024-03-27T00:58:53.830+0000 7fe5d3410700 1 ====== req done req=0x7fe4e0826730 op status=0 http_status=200 latency=0.148004770s ====== debug 2024-03-27T00:58:53.830+0000 7fe5d3410700 1 beast: 0x7fe4e0826730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=4 HTTP/1.1" 200 8388608 - "Boto3/ 1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.148004770s debug 2024-03-27T00:58:53.834+0000 7fe541aed700 1 ====== req done req=0x7fe4e0a2a730 op status=0 http_status=200 latency=0.152004898s ====== debug 2024-03-27T00:58:53.834+0000 7fe541aed700 1 beast: 0x7fe4e0a2a730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=8 HTTP/1.1" 200 8388608 - "Boto3/ 1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.152004898s debug 2024-03-27T00:58:53.834+0000 7fe56db45700 1 ====== req done req=0x7fe4e0928730 op status=0 http_status=200 latency=0.156005025s ====== debug 2024-03-27T00:58:53.834+0000 7fe56db45700 1 beast: 0x7fe4e0928730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.678 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=3 HTTP/1.1" 200 8388608 - "Boto3/ 1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.156005025s debug 2024-03-27T00:58:53.834+0000 7fe5342d2700 1 ====== req done req=0x7fe4e07a5730 op status=0 http_status=200 latency=0.152004898s ====== debug 2024-03-27T00:58:53.834+0000 7fe5342d2700 1 beast: 0x7fe4e07a5730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=7 HTTP/1.1" 200 8388608 - "Boto3/ 1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.152004898s debug 2024-03-27T00:58:53.838+0000 7fe59fba9700 1 ====== req done req=0x7fe4e08a7730 op status=0 http_status=200 latency=0.156005025s ====== debug 2024-03-27T00:58:53.838+0000 7fe59fba9700 1 beast: 0x7fe4e08a7730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=5 HTTP/1.1" 200 8388608 - "Boto3/1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.156005025s debug 2024-03-27T00:58:53.838+0000 7fe53c2e2700 1 ====== req done req=0x7fe4e0724730 op status=0 http_status=200 latency=0.156005025s ====== debug 2024-03-27T00:58:53.838+0000 7fe53c2e2700 1 beast: 0x7fe4e0724730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=6 HTTP/1.1" 200 8388608 - "Boto3/ 1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.156005025s debug 2024-03-27T00:58:53.838+0000 7fe5c7bf9700 1 ====== req done req=0x7fe4e06a3730 op status=0 http_status=200 latency=0.156005025s ====== debug 2024-03-27T00:58:53.838+0000 7fe5c7bf9700 1 beast: 0x7fe4e06a3730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=2 HTTP/1.1" 200 8388608 - "Boto3/ 1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.156005025s
So is it possible the put perf counter is counting the parts/chunks completed and not the client ops completed?
Updated by Matt Benjamin about 2 months ago
The rgw counter we should be sampling, I think, is l_rgw put. The logic to increment the counter, as I recalled in slack, is
in fact in the op layer--which tracks actual ops, not (e.g.) the 4MB chunks we might be writing out to RADOS.
OTOH, it seems like this counter is being incremented when the op starts executing, not when it has committed. So while it seems
implausible that you would see a discrete multiple of the number of objects warp says it uploaded, it does seem like this
would induce the counter to reflect the number of attempts to upload each object, not just the ones that succeeded:
void RGWPutObj::execute(optional_yield y)
{
char supplied_md5_bin[CEPH_CRYPTO_MD5_DIGESTSIZE + 1];
char supplied_md5[CEPH_CRYPTO_MD5_DIGESTSIZE * 2 + 1];
char calc_md5[CEPH_CRYPTO_MD5_DIGESTSIZE * 2 + 1];
unsigned char m[CEPH_CRYPTO_MD5_DIGESTSIZE];
MD5 hash;
// Allow use of MD5 digest in FIPS mode for non-cryptographic purposes
hash.SetFlags(EVP_MD_CTX_FLAG_NON_FIPS_ALLOW);
bufferlist bl, aclbl, bs;
int len;
off_t fst;
off_t lst;
auto counters = rgw::op_counters::get(s);
bool need_calc_md5 = (dlo_manifest == NULL) && (slo_info == NULL);
rgw::op_counters::inc(counters, l_rgw_op_put_obj, 1);
perfcounter->inc(l_rgw_put);
(this is downstream reef)
Updated by Paul Cuzner about 2 months ago
It's not just warp - boto is seeing the same thing and there is no indication of failed uploads.
Updated by Matt Benjamin about 2 months ago
Oh, duh, this is doing multipart upload.
debug 2024-03-27T00:58:53.834+0000 7fe541aed700 1 beast: 0x7fe4e0a2a730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=8 HTTP/1.1" 200 8388608 - "Boto3/ 1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.152004898s debug 2024-03-27T00:58:53.834+0000 7fe56db45700 1 ====== req done req=0x7fe4e0928730 op status=0 http_status=200 latency=0.156005025s ======
The l_rgw_put will be incremented once for each UploadPart attempted--perhaps each object is being sent in 4 (er, 16) MB parts?
Updated by Paul Cuzner about 2 months ago
Yep - so the put_ops value is incrementing at each successful 'part' being uploaded. Which means this is not a bug, but a problem in the way the counters may be used to understand what's going on
BTW, boto has a default threshold of 8MB for multipart too.
warp has a --disable-multipart flag/ When I used that the stats at the client match the rgw put_ops counter.
So perhaps the real question is, should there another perf counter that represents the upper level put ops count?
Bottom line is get_ops and put_ops are used to understand the load from a client perspective- how many ops are we seeing from the client
- deriving the average object size. i.e. bytes transferred / ops. But with things as they stand this is Ok for GET, but breaksdown for objects that are PUT with multipart - you end up determining the part size, instead of the total object size.
Updated by Casey Bodley about 1 month ago
- Tracker changed from Bug to Feature
- Subject changed from RGW Put Ops from perf counter is invalid for objects > 16MB to perf counters for CreateMultipartUpload, AbortMultipartUpload, CompleteMultipartUpload
thanks Matt and Paul. let's treat this as a feature request to add counters for the Create/Abort/CompleteMultipartUpload ops