Project

General

Profile

Actions

Bug #21026

closed

PR #16172 causing performance regression

Added by Mark Nelson over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is most obvious during 4K random writes to NVMe backed bluestore RBD volumes, but it may be present in other tests as well. A git bisection very clearly points to PR #16172 (introduced with v12.1.2) as the culprit.

Performance decreases in the above described test from 30K IOPS to 15K IOPS for a single OSD. A wallclock profile shows extra time spent in pg_log_dup_t get_key_name (~0.7%) and encode (~1.7%) per tp_osd_tp thread. Greg hypothesized that we might be doing unnecessary string manipulation in get_key_name and indeed it looks like there may be extra string manipulation and memory copying going on. Given that we are spending about 1.7% of the time in each tp_osd_tp thread doing pg_log_dup_t encode however, I suspect the bigger issue is that are writing a lot more pglog data to the KV store now and this is less about CPU overhead than simply using a greater percentage of our available KV store throughput for pglog. The column family PR might give us a better hint if this is correct.


Related issues 3 (0 open3 closed)

Blocks Ceph - Feature #20298: store longer dup op informationResolvedJ. Eric Ivancich06/14/2017

Actions
Copied to Ceph - Backport #21187: luminous: fix performance regressionResolvedJan FajerskiActions
Copied to RADOS - Backport #22400: jewel: PR #16172 causing performance regressionRejectedJ. Eric IvancichActions
Actions #2

Updated by Mark Nelson over 6 years ago

After discussion with Eric, we can avoid the new code path entirely by increasing the osd_min_pg_log_entries to the same value as osd_pg_log_dups_tracks (ie currently 1500 -> 3000), however this doesn't fix the problem, only avoids it by not invoking the new code. This has been verified to increase performance to near levels prior to PR #16172.

Actions #3

Updated by Josh Durgin over 6 years ago

  • Assignee set to Josh Durgin

It looks to me like this is due to writing out all the dups whenever any are dirty, instead of keeping a 'dirty_to' version that we check like with pg_log_entry_t. Writing out just the dup ops that changed should fix this.

Actions #4

Updated by Josh Durgin over 6 years ago

Mark reported further tests on plain SSD: default: ~10.2K IOPS, osd_min_pg_log_entries=3000: ~18.8K IOPS

Actions #5

Updated by Josh Durgin over 6 years ago

  • Status changed from New to Fix Under Review
  • Backport set to luminous
Actions #6

Updated by Josh Durgin over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Nathan Cutler over 6 years ago

Actions #8

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions #9

Updated by Ken Dreyer over 6 years ago

  • Status changed from Resolved to Pending Backport
  • Backport changed from luminous to luminous, jewel
Actions #10

Updated by Ken Dreyer over 6 years ago

Actions #11

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #22400: jewel: PR #16172 causing performance regression added
Actions #12

Updated by Nathan Cutler almost 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF