Feature #65131
openperf counters for CreateMultipartUpload, AbortMultipartUpload, CompleteMultipartUpload
0%
Description
The rgw daemon exposes a perf counter "rgw.puts" as a counter representing the number of put ops performed by the gateway.
On small objects the value is fine, but beyond 16MB the value appears to be counting in 16MB ops instead of the actual object size. For example, a workload of 64MB PUTs is 4x the value it should be when cross checked with the client.
(GETs counters are not effected by this issue)
To reproduce, use warp with something like
warp put --warp-client warp-1 --host rgw-1 --access-key $ACCESS_KEY --secret-key $SECRET_KEY --bucket $bucket --obj.size 64MB --concurrent 1 --duration 1m
And grab the counter stas via the admin socket
e.g.
#!/usr/bin/bash write_stats () { put_count=$(ceph daemon /var/run/ceph/ceph-client.rgw.group2.storage-13-09008.ujvjki.7.94246064814240.asok perf dump | jq ".rgw.put") now=$(date '+%s') echo "${now},${put_count}" } while true; do write_stats sleep 5 done
Here's my results;
From the rgw script
1711416974,2108690 1711416980,2108696 1711416985,2108727 1711416990,2108759 1711416995,2108790 1711417000,2108823 1711417005,2108858 1711417010,2108891 1711417015,2108923 1711417021,2108954 1711417026,2108985 1711417031,2109014 1711417036,2109044 1711417041,2109062 1711417046,2109062
The above shows a delta of 31-32 every 5s, so in theory the PUT rate is around 6 ops/sec
However, the results from warp show;
started workload, 1 client(s) with 64MB objects at Tue Mar 26 01:36:14 UTC 2024 warp: Benchmark data written to "put-experiment/ec8-2/client_count_1/clients_1_64MB_PUT.csv.zst" ---------------------------------------- Operation: PUT. Concurrency: 1 * Average: 94.18 MiB/s, 1.54 obj/s Throughput, split into 58 x 1s: * Fastest: 109.1MiB/s, 1.79 obj/s * 50% Median: 94.4MiB/s, 1.55 obj/s * Slowest: 84.2MiB/s, 1.38 obj/s warp: Cleanup done. workload completed, 1 client(s) with 64MB objects at Tue Mar 26 01:37:20 UTC 2024
If you account for the reporting unit in rgw being in 16MB ops not 64MB ops and divide by 4, the RGW stats would become 1.5 which is a match for the value reported by warp.
This affects any monitoring or grafana graphs that show put ops, and any that show PUT latency since latency calculations may rely on a count ops processed in an interval.
Files