Bug #22511
Dashboard showing stale health data
0%
Description
In 12.2.2 with a HEALTH_WARN cluster, the dashboard is showing stale health data.
The dashboard shows:
Overall status: HEALTH_WARN OBJECT_MISPLACED: 395167/541150152 objects misplaced (0.073%) PG_DEGRADED: Degraded data redundancy: 198/541150152 objects degraded (0.000%), 56 pgs unclean
But ceph status shows:
# ceph status cluster: id: eecca9ab-161c-474c-9521-0e5118612dbb health: HEALTH_WARN 1281/541046538 objects misplaced (0.000%) Degraded data redundancy: 1 pg unclean
Related issues
History
#1 Updated by John Spray over 6 years ago
Hmm, I've seen a couple of things vaguely similar to this: can you do a "ceph tell mgr.<id> config set debug_mgr 20" and gather the log?
It usually seems to get back up to date next time a mgr restarts but let's gather some evidence if we can
#2 Updated by John Spray over 6 years ago
- Category set to ceph-mgr
#3 Updated by Dan van der Ster over 6 years ago
Sure, see ceph-post-file: 217cba9a-5ae9-42b4-8e7a-76ba016397e0
At this moment, the dashboard displays:
Health Overall status: HEALTH_WARN OBJECT_MISPLACED: 395167/541150152 objects misplaced (0.073%) PG_DEGRADED: Degraded data redundancy: 198/541150152 objects degraded (0.000%), 56 pgs unclean
#4 Updated by John Spray over 6 years ago
Hmm, so the mon is showing you the same health status that the mgr is sending in DaemonServer::send_report, which is presumably the correct and up to date one.
There are also no handle_mgr_digest messages in the log, so something is going wrong with the transmission of the MMgrDigest (which contains the full health structure) from the mon to the mgr.
The mgr side is using the standard MonClient bits to subscribe, so my hunch would be something wrong in MgrMonitor. Bit suspicious of the part in ::send_digests where it drops out if is_active()==false (from https://github.com/ceph/ceph/pull/15109)
I wonder if this is an edge case where the MonClient has a valid subscription to one of the peon monitors but not to the leader?
#5 Updated by John Spray over 6 years ago
- Duplicates Bug #22142: mon doesn't send health status after paxos service is inactive temporarily added
#6 Updated by John Spray over 6 years ago
- Status changed from New to Duplicate
Ah, that suspect piece of code was already updated in master for http://tracker.ceph.com/issues/22142 which is currently pending backport for luminous. Seems highly likely that this is a duplicate of that.