Bug #14952
Updated by John Spray about 8 years ago
(In master, on a vstart cluster but presumably happens on real clusters too)
Right after creating some pools:
<pre>
health HEALTH_ERR
2 pgs are stuck inactive for more than 300 seconds
5 pgs degraded
2 pgs stuck inactive
5 pgs stuck unclean
5 pgs undersized
</pre>
It seems like this is probably because the stats like last_active are zero if something has never been active? The logic in PGMonitor is checking these stats against (now - mon_pg_stuck_threshold), and 0 is always before that cutoff.
What should our logic be here:
* we could initialize all the last_* stats to the time of creation
* we could never count something as stuck until the PG has at least existed for mon_pg_stuck_threshold?
As it is the messages are definitely crazy, especially the "for more than 300 seconds" message on a cluster that I created two seconds ago.