log lines when hitting "pg overdose protection"
After upgrading to Luminous we ran into situation where 10% of our pgs remained unavailable, stuck in "activating" state.
That blog post says:
"If any individual OSD is ever asked to create more PGs than it should it will simply refuse and ignore the request."
The only non-debug direct evidence was this WARNING in ceph status:
'too many PGs per OSD (221 > max 200)'
(We are aware that we need to fix this situation in our cluster)
Many pgs were stuck in "activating" state which is not documented in the pg state table:
Feature idea would be that the OSD should write to standard log level when it refuses to create the pg / hits the osd_max_pg_per_osd_hard_ratio.
We saw lots of "stuck" in all of the management command outputs but not the underlying reason.
I would also inquire whether this situation should issues an ERROR rather than a WARNING since the cluster becomes "partially unavailable".