Project

General

Profile

Actions

Bug #2161

closed

nonlinear scaling for PGMap::pg_stat encode

Added by Sage Weil about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

> OSDs size of pg_stat_t
> latest encode
> time
> 
> 48 2976397 0.323052
> 72 4472477 0.666633
> 96 5969461 1.159198
> 120 7466021 1.738096
> 144 8963141 2.428229
> 168 10460309 3.203832
> 192 11956709 4.083013
> 240 14950445 6.453171
> 288 17916589 9.462052

My guesses are:
- something in bufferlist is doing something O(n) on the list<ptr>
- some map<> is getting hammered
?

Actions #1

Updated by Ake van der Meer about 12 years ago

My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pastebin.com/wYnPKWeJ

In src/include/buffer.h (version 0.44.1) I found the following comment about a set of operations including that copy_in():
// WARNING: this are horribly inefficient for large bufferlists.

The same may cause the above encoding behaviour?

Actions #2

Updated by Sage Weil about 12 years ago

Ake van der Meer wrote:

My ceph-osd processes run at 100% CPU for many minutes at a time doing this: http://pastebin.com/wYnPKWeJ

In src/include/buffer.h (version 0.44.1) I found the following comment about a set of operations including that copy_in():
// WARNING: this are horribly inefficient for large bufferlists.

The same may cause the above encoding behaviour?

Aha! That is indeed the problem. Working up a fix now.

Thanks!

Actions #3

Updated by Sage Weil about 12 years ago

  • Status changed from 12 to 7
  • Target version set to v0.46

wip-encoding

Actions #4

Updated by Sage Weil about 12 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF