Bug #21026: PR #16172 causing performance regression - Ceph - Ceph

Actions

Copy link

Bug #21026

closed

PR #16172 causing performance regression

Added by Mark Nelson over 6 years ago. Updated almost 6 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Josh Durgin

Category:

Target version:

% Done:

Source:

Tags:

Backport:

luminous, jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v12.1.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This is most obvious during 4K random writes to NVMe backed bluestore RBD volumes, but it may be present in other tests as well. A git bisection very clearly points to PR #16172 (introduced with v12.1.2) as the culprit.

Performance decreases in the above described test from 30K IOPS to 15K IOPS for a single OSD. A wallclock profile shows extra time spent in pg_log_dup_t get_key_name (~0.7%) and encode (~1.7%) per tp_osd_tp thread. Greg hypothesized that we might be doing unnecessary string manipulation in get_key_name and indeed it looks like there may be extra string manipulation and memory copying going on. Given that we are spending about 1.7% of the time in each tp_osd_tp thread doing pg_log_dup_t encode however, I suspect the bigger issue is that are writing a lot more pglog data to the KV store now and this is less about CPU overhead than simply using a greater percentage of our available KV store throughput for pglog. The column family PR might give us a better hint if this is correct.

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #21026

PR #16172 causing performance regression

Updated by Ian Colle over 6 years ago

Updated by Mark Nelson over 6 years ago

Updated by Josh Durgin over 6 years ago

Updated by Josh Durgin over 6 years ago

Updated by Josh Durgin over 6 years ago

Updated by Josh Durgin over 6 years ago

Updated by Nathan Cutler over 6 years ago

Updated by Nathan Cutler over 6 years ago

Updated by Ken Dreyer over 6 years ago

Updated by Ken Dreyer over 6 years ago

Updated by Nathan Cutler over 6 years ago

Updated by Nathan Cutler almost 6 years ago