Project

General

Profile

Bug #14952

Updated by John Spray about 8 years ago


(In master, on a vstart cluster but presumably happens on real clusters too)

Right after creating some pools:

<pre>
health HEALTH_ERR
2 pgs are stuck inactive for more than 300 seconds
5 pgs degraded
2 pgs stuck inactive
5 pgs stuck unclean
5 pgs undersized
</pre>

It seems like this is probably because the stats like last_active are zero if something has never been active? The logic in PGMonitor is checking these stats against (now - mon_pg_stuck_threshold), and 0 is always before that cutoff.

What should our logic be here:


* we could initialize all the last_* stats to the time of creation

* we could never count something as stuck until the PG has at least existed for mon_pg_stuck_threshold?

As it is the messages are definitely crazy, especially the "for more than 300 seconds" message on a cluster that I created two seconds ago.

Back