Feature #65131: perf counters for CreateMultipartUpload, AbortMultipartUpload, CompleteMultipartUpload - rgw - Ceph

Actions

Copy link

Feature #65131

open

perf counters for CreateMultipartUpload, AbortMultipartUpload, CompleteMultipartUpload

Added by Paul Cuzner about 2 months ago. Updated about 1 month ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

Ceph - v19.1.0

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Ceph - v18.2.1

Pull request ID:

Description

The rgw daemon exposes a perf counter "rgw.puts" as a counter representing the number of put ops performed by the gateway.

On small objects the value is fine, but beyond 16MB the value appears to be counting in 16MB ops instead of the actual object size. For example, a workload of 64MB PUTs is 4x the value it should be when cross checked with the client.
(GETs counters are not effected by this issue)

To reproduce, use warp with something like

warp put --warp-client warp-1 --host rgw-1 --access-key $ACCESS_KEY --secret-key $SECRET_KEY --bucket $bucket --obj.size 64MB --concurrent 1 --duration 1m

And grab the counter stas via the admin socket
e.g.

#!/usr/bin/bash

write_stats () {
  put_count=$(ceph daemon /var/run/ceph/ceph-client.rgw.group2.storage-13-09008.ujvjki.7.94246064814240.asok perf dump | jq ".rgw.put")
  now=$(date '+%s')
  echo "${now},${put_count}" 
}

while true; do
  write_stats
  sleep 5
done

Here's my results;

From the rgw script

1711416974,2108690
1711416980,2108696
1711416985,2108727
1711416990,2108759
1711416995,2108790
1711417000,2108823
1711417005,2108858
1711417010,2108891
1711417015,2108923
1711417021,2108954
1711417026,2108985
1711417031,2109014
1711417036,2109044
1711417041,2109062
1711417046,2109062

The above shows a delta of 31-32 every 5s, so in theory the PUT rate is around 6 ops/sec

However, the results from warp show;

started workload, 1 client(s) with 64MB objects at Tue Mar 26 01:36:14 UTC 2024
warp: Benchmark data written to "put-experiment/ec8-2/client_count_1/clients_1_64MB_PUT.csv.zst"                                                                                                                                              

----------------------------------------
Operation: PUT. Concurrency: 1
* Average: 94.18 MiB/s, 1.54 obj/s

Throughput, split into 58 x 1s:
 * Fastest: 109.1MiB/s, 1.79 obj/s
 * 50% Median: 94.4MiB/s, 1.55 obj/s
 * Slowest: 84.2MiB/s, 1.38 obj/s
warp: Cleanup done.                                                                                                                                                                                                                           
workload completed, 1 client(s) with 64MB objects at Tue Mar 26 01:37:20 UTC 2024

If you account for the reporting unit in rgw being in 16MB ops not 64MB ops and divide by 4, the RGW stats would become 1.5 which is a match for the value reported by warp.

This affects any monitoring or grafana graphs that show put ops, and any that show PUT latency since latency calculations may rely on a count ops processed in an interval.

Files

put-test.py (873 Bytes) put-test.py

Paul Cuzner, 03/27/2024 12:34 AM

Actions

Copy link

Updated by Paul Cuzner about 2 months ago

File put-test.py put-test.py added

Also tried python's boto3 with a 64MB objects and saw a similar thing where the put count from rgw is way higher than the actual value seen at the client.
(test program attached)

Here's the output from the program


time, ops/interval
1711501061,21
1711501066,22
1711501071,21
1711501076,21
1711501081,20
1711501087,20
1711501092,20
1711501097,21
1711501102,20
1711501107,20
1711501112,20
1711501117,20
1711501122,21
1711501127,20
1711501132,20

So every 5 secs the client says around 20 puts have been completed - i.e. approx 4 put ops/sec

And the output from the rgw dumpstats script shows

time, ops_counter
1711501062,2119534
1711501068,2119710
1711501073,2119870
1711501078,2120038
1711501083,2120198
1711501088,2120358
1711501093,2120526
1711501098,2120694
1711501103,2120854
1711501109,2121014
1711501114,2121174
1711501119,2121342
1711501124,2121502
1711501129,2121662
1711501134,2121806

The same interval of 5 secs was used. The delta between samples from rgw shows an increase of around 160 ops, so per sec that shows around 32 put ops/sec

No errors at the client, or at the radosgw.

What I do see at the rgw end is the request being broken up

debug 2024-03-27T00:58:53.678+0000 7fe563b31700  1 ====== starting new request req=0x7fe4e09a9730 =====                                                                                                                                       
debug 2024-03-27T00:58:53.678+0000 7fe516a97700  1 ====== starting new request req=0x7fe4e0928730 =====                                                                                                                                       
debug 2024-03-27T00:58:53.682+0000 7fe546af7700  1 ====== starting new request req=0x7fe4e06a3730 =====                                                                                                                                       
debug 2024-03-27T00:58:53.682+0000 7fe52dac5700  1 ====== starting new request req=0x7fe4e0a2a730 =====                                                                                                                                       
debug 2024-03-27T00:58:53.682+0000 7fe510a8b700  1 ====== starting new request req=0x7fe4e07a5730 =====                                                                                                                                       
debug 2024-03-27T00:58:53.682+0000 7fe5282ba700  1 ====== starting new request req=0x7fe4e0826730 =====                                                                                                                                       
debug 2024-03-27T00:58:53.682+0000 7fe596396700  1 ====== starting new request req=0x7fe4e08a7730 =====                                                                                                                                       
debug 2024-03-27T00:58:53.682+0000 7fe55831a700  1 ====== starting new request req=0x7fe4e0724730 =====                                                                                                                                       
debug 2024-03-27T00:58:53.826+0000 7fe574352700  1 ====== req done req=0x7fe4e09a9730 op status=0 http_status=200 latency=0.148004770s ======                                                                                                 
debug 2024-03-27T00:58:53.826+0000 7fe574352700  1 beast: 0x7fe4e09a9730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.678 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=1 HTTP/1.1" 200 8388608 - "Boto3/
1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.148004770s                                                                                                                                               
debug 2024-03-27T00:58:53.830+0000 7fe5d3410700  1 ====== req done req=0x7fe4e0826730 op status=0 http_status=200 latency=0.148004770s ======                                                                                                 
debug 2024-03-27T00:58:53.830+0000 7fe5d3410700  1 beast: 0x7fe4e0826730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=4 HTTP/1.1" 200 8388608 - "Boto3/
1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.148004770s                                                                                                                                               
debug 2024-03-27T00:58:53.834+0000 7fe541aed700  1 ====== req done req=0x7fe4e0a2a730 op status=0 http_status=200 latency=0.152004898s ======                                                                                                 
debug 2024-03-27T00:58:53.834+0000 7fe541aed700  1 beast: 0x7fe4e0a2a730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=8 HTTP/1.1" 200 8388608 - "Boto3/
1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.152004898s                                                                                                                                               
debug 2024-03-27T00:58:53.834+0000 7fe56db45700  1 ====== req done req=0x7fe4e0928730 op status=0 http_status=200 latency=0.156005025s ======                                                                                                 
debug 2024-03-27T00:58:53.834+0000 7fe56db45700  1 beast: 0x7fe4e0928730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.678 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=3 HTTP/1.1" 200 8388608 - "Boto3/
1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.156005025s                                                                                                                                               
debug 2024-03-27T00:58:53.834+0000 7fe5342d2700  1 ====== req done req=0x7fe4e07a5730 op status=0 http_status=200 latency=0.152004898s ======                                                                                                 
debug 2024-03-27T00:58:53.834+0000 7fe5342d2700  1 beast: 0x7fe4e07a5730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=7 HTTP/1.1" 200 8388608 - "Boto3/
1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.152004898s                                                                                                                                               
debug 2024-03-27T00:58:53.838+0000 7fe59fba9700  1 ====== req done req=0x7fe4e08a7730 op status=0 http_status=200 latency=0.156005025s ======                                                                                                 
debug 2024-03-27T00:58:53.838+0000 7fe59fba9700  1 beast: 0x7fe4e08a7730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=5 HTTP/1.1" 200 8388608 - "Boto3/1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.156005025s
debug 2024-03-27T00:58:53.838+0000 7fe53c2e2700  1 ====== req done req=0x7fe4e0724730 op status=0 http_status=200 latency=0.156005025s ======
debug 2024-03-27T00:58:53.838+0000 7fe53c2e2700  1 beast: 0x7fe4e0724730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=6 HTTP/1.1" 200 8388608 - "Boto3/
1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.156005025s
debug 2024-03-27T00:58:53.838+0000 7fe5c7bf9700  1 ====== req done req=0x7fe4e06a3730 op status=0 http_status=200 latency=0.156005025s ======
debug 2024-03-27T00:58:53.838+0000 7fe5c7bf9700  1 beast: 0x7fe4e06a3730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=2 HTTP/1.1" 200 8388608 - "Boto3/
1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.156005025s

So is it possible the put perf counter is counting the parts/chunks completed and not the client ops completed?

Actions

Copy link

Updated by Matt Benjamin about 2 months ago

The rgw counter we should be sampling, I think, is l_rgw put. The logic to increment the counter, as I recalled in slack, is
in fact in the op layer--which tracks actual ops, not (e.g.) the 4MB chunks we might be writing out to RADOS.

OTOH, it seems like this counter is being incremented when the op starts executing, not when it has committed. So while it seems
implausible that you would see a discrete multiple of the number of objects warp says it uploaded, it does seem like this
would induce the counter to reflect the number of attempts to upload each object, not just the ones that succeeded:

void RGWPutObj::execute(optional_yield y)
{
  char supplied_md5_bin[CEPH_CRYPTO_MD5_DIGESTSIZE + 1];
  char supplied_md5[CEPH_CRYPTO_MD5_DIGESTSIZE * 2 + 1];
  char calc_md5[CEPH_CRYPTO_MD5_DIGESTSIZE * 2 + 1];
  unsigned char m[CEPH_CRYPTO_MD5_DIGESTSIZE];
  MD5 hash;
  // Allow use of MD5 digest in FIPS mode for non-cryptographic purposes
  hash.SetFlags(EVP_MD_CTX_FLAG_NON_FIPS_ALLOW);
  bufferlist bl, aclbl, bs;
  int len;

  off_t fst;
  off_t lst;

  auto counters = rgw::op_counters::get(s);

  bool need_calc_md5 = (dlo_manifest == NULL) && (slo_info == NULL);
  rgw::op_counters::inc(counters, l_rgw_op_put_obj, 1);
  perfcounter->inc(l_rgw_put);

(this is downstream reef)

Actions

Copy link

Updated by Paul Cuzner about 2 months ago

It's not just warp - boto is seeing the same thing and there is no indication of failed uploads.

Actions

Copy link

Updated by Matt Benjamin about 2 months ago

Oh, duh, this is doing multipart upload.

debug 2024-03-27T00:58:53.834+0000 7fe541aed700  1 beast: 0x7fe4e0a2a730: 10.242.8.219 - warp [27/Mar/2024:00:58:53.682 +0000] "PUT /boto-2/object_311?uploadId=2~gDSKylGfy-fsnVNgQQFw-HPOWsg0Loa&partNumber=8 HTTP/1.1" 200 8388608 - "Boto3/
1.9.253 Python/3.8.10 Linux/5.4.0-169-generic Botocore/1.16.19 Resource" - latency=0.152004898s                                                                                                                                               
debug 2024-03-27T00:58:53.834+0000 7fe56db45700  1 ====== req done req=0x7fe4e0928730 op status=0 http_status=200 latency=0.156005025s ======

The l_rgw_put will be incremented once for each UploadPart attempted--perhaps each object is being sent in 4 (er, 16) MB parts?

Actions

Copy link

Updated by Paul Cuzner about 2 months ago

Yep - so the put_ops value is incrementing at each successful 'part' being uploaded. Which means this is not a bug, but a problem in the way the counters may be used to understand what's going on

BTW, boto has a default threshold of 8MB for multipart too.

warp has a --disable-multipart flag/ When I used that the stats at the client match the rgw put_ops counter.

So perhaps the real question is, should there another perf counter that represents the upper level put ops count?

Bottom line is get_ops and put_ops are used to understand the load from a client perspective

how many ops are we seeing from the client
deriving the average object size. i.e. bytes transferred / ops. But with things as they stand this is Ok for GET, but breaksdown for objects that are PUT with multipart - you end up determining the part size, instead of the total object size.

Actions

Copy link

Updated by Casey Bodley about 1 month ago

Tracker changed from Bug to Feature
Subject changed from RGW Put Ops from perf counter is invalid for objects > 16MB to perf counters for CreateMultipartUpload, AbortMultipartUpload, CompleteMultipartUpload

thanks Matt and Paul. let's treat this as a feature request to add counters for the Create/Abort/CompleteMultipartUpload ops

Actions

Copy link

Updated by Casey Bodley about 1 month ago

Backport deleted (~~squid~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Feature #65131

perf counters for CreateMultipartUpload, AbortMultipartUpload, CompleteMultipartUpload

Updated by Paul Cuzner about 2 months ago

Updated by Matt Benjamin about 2 months ago

Updated by Paul Cuzner about 2 months ago

Updated by Matt Benjamin about 2 months ago

Updated by Paul Cuzner about 2 months ago

Updated by Casey Bodley about 1 month ago

Updated by Casey Bodley about 1 month ago