Project

General

Profile

Bug #56640

RGW S3 workload has a huge performance boost in quincy 17.2.0 as compared to 17.2.1

Added by Vikhyat Umrao 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

RGW S3 small object workload has a huge performance boost in the quincy 17.2.0 as compared to 17.2.1 due to bluestore_zero_block_detection being true in 17.2.0.

The small object workload - Cosbench histogram - sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB"/>

We still want to investigate why this feature is providing this much of a performance boost in 17.2.0.

RGW S3 - Debug - smaller - 1hr.pdf (48.7 KB) Vikhyat Umrao, 07/19/2022 08:53 PM

History

#1 Updated by Vikhyat Umrao 3 months ago

This has been already verified that by default COSbench uses random not zero.

COSBench document - page 50 - https://raw.githubusercontent.com/intel-cloud/cosbench/master/COSBenchUserGuide.pdf
content | String | “random”(default) ”zero” | Fill object content with random data or all-zeros

#2 Updated by Vikhyat Umrao 3 months ago

The `bluestore_zero_block_detection` was set to false in 17.2.1. For more details please check:

https://tracker.ceph.com/issues/55521
https://github.com/ceph/ceph/pull/46193/files

Release notes:

=17.2.1

  • The "BlueStore zero block detection" feature (first introduced to Quincy in
    https://github.com/ceph/ceph/pull/43337) has been turned off by default with a
    new global configuration called `bluestore_zero_block_detection`. This feature,
    intended for large-scale synthetic testing, does not interact well with some RBD
    and CephFS features. Any side effects experienced in previous Quincy versions
    would no longer occur, provided that the configuration remains set to false.
    Relevant tracker: https://tracker.ceph.com/issues/55521

#3 Updated by Vikhyat Umrao 3 months ago

Vikhyat Umrao wrote:

This has been already verified that by default COSbench uses random not zero.

COSBench document - page 50 - https://raw.githubusercontent.com/intel-cloud/cosbench/master/COSBenchUserGuide.pdf
content | String | “random”(default) ”zero” | Fill object content with random data or all-zeros

and in our testing results mentioned in this attached doc do not use zero, it by default uses random.

#4 Updated by Vikhyat Umrao 3 months ago

There are two test cases that would be executed to find more details on what is going on with this feature and how it is giving this much perf boost in 17.2.0 and in 17.2.1(when the feature is enabled).

Cluster 1 - Testcase 1 - bluestore_zero_block_detection=false - Default - no need to set

- Deploy 17.2.1 cluster
- Do a fill workload to fill the cluster

- Enable debug_bluestore 20
ceph config set osd debug_bluestore 20

- Do a 1-hr hybrid with 10 minutes interval for osd perf dump

Cluster 2 - Testcase 2 - bluestore_zero_block_detection=true non-Default - need to set

- Deploy 17.2.1 cluster with bluestore_zero_block_detection=true
ceph config set osd bluestore_zero_block_detection true

- Do a fill workload to fill the cluster
- Enable debug_bluestore 20
ceph config set osd debug_bluestore 20

- Do a 1-hr hybrid with 10 minutes interval for osd perf dump

#5 Updated by Vikhyat Umrao 3 months ago

  • Description updated (diff)

#6 Updated by Vikhyat Umrao 3 months ago

  • Description updated (diff)

#7 Updated by Tim Wilkinson 2 months ago

Two rounds of testing have occurred, one with osd debug_bluestore=20 but this one did not reproduce the perf boost seen in hybrid-1hr jobs w/BSZBD enabled. Another test without BSZBD did reproduce the boost. Both sets of ceph logs were copied to /root on all players.

Testing w/debug 20: /root/ceph_220722-1609
Testing w/out debug 20: /root/220725-1513_{sa,ceph}

The clusters have since been redeployed.

#8 Updated by Tim Wilkinson 2 months ago

Correction, each test had site1 w/out BSZBD and the other with it enabled. The second round of tests were the same but without osd debug_bluestore=20.

Also available in: Atom PDF