Project

General

Profile

Actions

Feature #65131

open

perf counters for CreateMultipartUpload, AbortMultipartUpload, CompleteMultipartUpload

Added by Paul Cuzner about 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

The rgw daemon exposes a perf counter "rgw.puts" as a counter representing the number of put ops performed by the gateway.

On small objects the value is fine, but beyond 16MB the value appears to be counting in 16MB ops instead of the actual object size. For example, a workload of 64MB PUTs is 4x the value it should be when cross checked with the client.
(GETs counters are not effected by this issue)

To reproduce, use warp with something like

warp put --warp-client warp-1 --host rgw-1 --access-key $ACCESS_KEY --secret-key $SECRET_KEY --bucket $bucket --obj.size 64MB --concurrent 1 --duration 1m

And grab the counter stas via the admin socket
e.g.

#!/usr/bin/bash

write_stats () {
  put_count=$(ceph daemon /var/run/ceph/ceph-client.rgw.group2.storage-13-09008.ujvjki.7.94246064814240.asok perf dump | jq ".rgw.put")
  now=$(date '+%s')
  echo "${now},${put_count}" 
}

while true; do
  write_stats
  sleep 5
done

Here's my results;

From the rgw script

1711416974,2108690
1711416980,2108696
1711416985,2108727
1711416990,2108759
1711416995,2108790
1711417000,2108823
1711417005,2108858
1711417010,2108891
1711417015,2108923
1711417021,2108954
1711417026,2108985
1711417031,2109014
1711417036,2109044
1711417041,2109062
1711417046,2109062


The above shows a delta of 31-32 every 5s, so in theory the PUT rate is around 6 ops/sec

However, the results from warp show;

started workload, 1 client(s) with 64MB objects at Tue Mar 26 01:36:14 UTC 2024
warp: Benchmark data written to "put-experiment/ec8-2/client_count_1/clients_1_64MB_PUT.csv.zst"                                                                                                                                              

----------------------------------------
Operation: PUT. Concurrency: 1
* Average: 94.18 MiB/s, 1.54 obj/s

Throughput, split into 58 x 1s:
 * Fastest: 109.1MiB/s, 1.79 obj/s
 * 50% Median: 94.4MiB/s, 1.55 obj/s
 * Slowest: 84.2MiB/s, 1.38 obj/s
warp: Cleanup done.                                                                                                                                                                                                                           
workload completed, 1 client(s) with 64MB objects at Tue Mar 26 01:37:20 UTC 2024

If you account for the reporting unit in rgw being in 16MB ops not 64MB ops and divide by 4, the RGW stats would become 1.5 which is a match for the value reported by warp.

This affects any monitoring or grafana graphs that show put ops, and any that show PUT latency since latency calculations may rely on a count ops processed in an interval.


Files

put-test.py (873 Bytes) put-test.py Paul Cuzner, 03/27/2024 12:34 AM
Actions

Also available in: Atom PDF