Project

General

Profile

Actions

Bug #40791

closed

high variance in pg size

Added by Jan Fajerski almost 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We're seeing a cluster that has a history of being very unbalanced in terms of OSD utilisation. The balancer in upmap mode was turned on which got the pg count perfectly balanced, however the utilsation is varies considerably.

Looking at some OSDs it seems that some PGs have very close to two times the number of objects than others.

A slice of a processed PG dump from one OSD ordered by number of objects:

33705 ["2.82es0" 
33660 ["2.f34s3" 
33574 ["2.c63s9" 
33559 ["2.fe7s7" 
33558 ["2.f6fs2" 
33499 ["2.ebcs10" 
17245 ["2.3ds11" 
17227 ["2.1076s3" 
17217 ["2.68bs6" 
17183 ["2.34s8" 
17178 ["2.6f3s2" 
17167 ["2.211s2" 

The workload is VMs on rbd images backed by an erasure coded pool. The VM images are cloned from one master rbd image snapshot.

Initially the pg count of the (EC) data pool was deemed much too low and was increased (doubled iirc).

I don't have direkt acces to the cluster but I'm trying to get a object listing for two PGs, say 2.ebcs10 and 2.3ds11 (cp . above).

Actions

Also available in: Atom PDF