Bug #14952
Updated by John Spray about 8 years ago
(In master, on a vstart cluster but presumably happens on real clusters too) Right after creating some pools: <pre> health HEALTH_ERR 2 pgs are stuck inactive for more than 300 seconds 5 pgs degraded 2 pgs stuck inactive 5 pgs stuck unclean 5 pgs undersized </pre> It seems like this is probably because the stats like last_active are zero if something has never been active? The logic in PGMonitor is checking these stats against (now - mon_pg_stuck_threshold), and 0 is always before that cutoff. What should our logic be here: * we could initialize all the last_* stats to the time of creation * we could never count something as stuck until the PG has at least existed for mon_pg_stuck_threshold? As it is the messages are definitely crazy, especially the "for more than 300 seconds" message on a cluster that I created two seconds ago.