Project

General

Profile

Actions

Bug #21846

closed

Default ms log level results in ~40% performance degradation on RBD 4K random read IO

Added by Jason Dillaman over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Luminous is now 15% slower than Jewel and over 40% slower as compared to when the ms logs are disabled.

v10.2.10 defaults:

   bw (  KiB/s): min=101152, max=115880, per=100.00%, avg=111149.07, stdev=3721.50, samples=30
   iops        : min=25288, max=28970, avg=27787.23, stdev=930.28, samples=30

v12.2.1 defaults:

   bw (  KiB/s): min=81376, max=98360, per=100.00%, avg=92884.47, stdev=4851.53, samples=30
   iops        : min=20344, max=24590, avg=23221.10, stdev=1212.87, samples=30

v12.2.1 w/ "debug ms = 0":

   bw (  KiB/s): min=154096, max=165448, per=100.00%, avg=160584.73, stdev=3897.58, samples=30
   iops        : min=38524, max=41362, avg=40146.10, stdev=974.31, samples=30
Actions #1

Updated by Sage Weil over 6 years ago

  • Status changed from New to 12
  • Priority changed from Normal to Urgent

Two options?

1. Just set debug ms = 0 by default for clients.
2. Fix the async msgr to not log the second message. That probably doesn't help as much?

The in-memory ms logging seems less useful on the client side...? Jason?

Actions #2

Updated by Jason Dillaman over 6 years ago

I posted PR https://github.com/ceph/ceph/pull/18418 as a temporary workaround for clients. I figured I would leave this one as a placeholder for perhaps a better tracing
/ "flight data recorder" system for performance critical code paths.

Actions #3

Updated by Ken Dreyer over 6 years ago

Jason Dillaman wrote:

I posted PR https://github.com/ceph/ceph/pull/18418 as a temporary workaround for clients. I figured I would leave this one as a placeholder for perhaps a better tracing
/ "flight data recorder" system for performance critical code paths.

How should we indicate that PR 18418 needs to go into Luminous (v12.2.2?)

Actions #4

Updated by Jason Dillaman over 6 years ago

Ken Dreyer wrote:

How should we indicate that PR 18418 needs to go into Luminous (v12.2.2?)

Using the magic for backporting? See tracker ticket #21860 that is associated w/ that PR.

Actions #5

Updated by Greg Farnum over 6 years ago

  • Status changed from 12 to Closed
Actions

Also available in: Atom PDF