Bug #8103
closed
pool has too few PGs warning misleading when using cache pools
Added by Mark Nelson about 10 years ago.
Updated over 6 years ago.
Description
When using cache pools on a fresh filesystem, quickly the number of objects in the cache pool can greatly exceed the number of objects in the base pool, causing ceph health to warn:
nhm@burnupiX:/tmp/cbt/ceph/log$ ceph health
HEALTH_WARN pool rados-bench-burnupiY-2-cache has too few pgs; pool rados-bench-burnupiY-3-cache has too few pgs
Looking at ceph health detail, we see:
HEALTH_WARN pool rados-bench-burnupiY-2-cache has too few pgs; pool rados-bench-burnupiY-3-cache has too few pgs
pool rados-bench-burnupiY-2-cache objects per pg (14) is more than 14 times cluster average (1)
pool rados-bench-burnupiY-3-cache objects per pg (14) is more than 14 times cluster average (1)
In reality, this isn't a problem and the warning can be worked around by changing the max object skew, but it's pretty misleading and I would argue not particularly helpful.
Given that this is a transient issue for a new, empty cluster, I'm not sure if it is worth making an exception for the warning...
It seems like there may be other situations where this is misleading too. Say if you have many mostly empty pools and 1 heavily utilized one, or SSD backed pools with a very high number of small objects vs spinning disk pools with fewer, larger objects.
Are we finding users in situations where skewed object/pg distributions are causing problems?
Mark Nelson wrote:
It seems like there may be other situations where this is misleading too. Say if you have many mostly empty pools and 1 heavily utilized one, or SSD backed pools with a very high number of small objects vs spinning disk pools with fewer, larger objects.
Are we finding users in situations where skewed object/pg distributions are causing problems?
It's just a tool to help users identify when their pg_num values are out of whack. We can bump up the threshold... or, we could exclude any empty pools from the average, perhaps...
- Priority changed from Urgent to High
- Status changed from New to Won't Fix
no simple way to avoid this false positive but still maintain this warning. and it is useful.
I'm also running into this:
$ ceph health detail
HEALTH_WARN pool default.rgw.buckets.data has many more objects per pg than average (too few pgs?)
pool default.rgw.buckets.data objects per pg (446) is more than 12.7429 times cluster average (35)
$ rados df
pool name KB objects clones degraded unfound rd rd KB wr wr KB
.rgw.root 2 4 0 0 0 30 24 4 5
default.rgw.buckets.data 56094130 57153 0 0 0 572099 59761952 583966 58240927
default.rgw.buckets.index 0 69 0 0 0 6730064 6828134 177526 0
default.rgw.buckets.non-ec 0 0 0 0 0 312 344 786 0
default.rgw.control 0 8 0 0 0 0 0 0 0
default.rgw.data.root 2 6 0 0 0 0 0 15 6
default.rgw.gc 0 32 0 0 0 133602 135155 101032 0
default.rgw.log 0 127 0 0 0 2960042 2959915 1973580 0
default.rgw.meta 4 9 0 0 0 0 0 21 9
default.rgw.users.keys 1 1 0 0 0 3 2 1 1
default.rgw.users.swift 1 1 0 0 0 0 0 1 1
default.rgw.users.uid 1 2 0 0 0 11593 11590 11504 3
rbd 0 0 0 0 0 0 0 0 0
total used 250293020 57412
total avail 11439457452
total space 11689750472
We are using RadosGW with one pool which currently stores all objects. All the other pools are purely administrative and relatively empty. In my opinion this warning can fairly easily be fixed by excluding all pools with less than, for instance, 1000 objects.
But that's from my naive POV. I'm not entirely sure in what scenario this warning would be of help. I would think that it would be more useful if Ceph reported if some performance related threshold is crossed regarding objects/PG. Comparing this to a (local) average feels strange. But I'm hardly an expert. For people starting with Ceph it would be nice if some false positives could be avoided.
+1. What really matters is that PGs are applied to pools that are doing a lot of I/O to even the load across OSDs. This warning is really trying to provide an indicator that OSD load may not be level because of the imbalance in PGs, but it's really not accurate unless there is I/O actually happening in the pool. So perhaps a more relevant indicator would be a significant imbalance in OSD I/O by a particular storage pool. That may now be easier to implement with a ceph-mgr module perhaps? Also, the new per-pool reweighting option in Luminous may provide an alternative to ever-higher PG counts.
Also available in: Atom
PDF