Actions
Bug #22142
closedmon doesn't send health status after paxos service is inactive temporarily
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
To reproduce:
#start a vstart cluster
../src/vstart.sh -n -s -d
#start the prometheus module for health status, the dashboard shows the same info
bin/ceph mgr module enable prometheus
# confirm healthy cluster state
curl 192.168.178.4:9283/metrics | grep "ceph_health_status 0.0"
# kill a mon and wait a bit for status to change
kill `cat out/mon.a.pid`
sleep 1m
# check of health warn
bin/ceph -s
# mgr modules still show healthy
curl 192.168.178.4:9283/metrics | grep "ceph_health_status 1.0" # 1.0 is warn
curl 192.168.178.4:9283/metrics | grep "ceph_health_status 0.0" # 0.0 is healthy
Alternatively check the mgr dashboard (see screenshot).
It seems like sometimes the status propagates correctly, i.e. the dashboard and prometheus module show the WARN state
Files
Updated by Jan Fajerski over 6 years ago
mon/MgrMonitor::send_digests() stops the periodic digests if PaxosService goes inactive for a time (say when a MON goes down). Proposed fix coming up.
Updated by Jan Fajerski over 6 years ago
- Project changed from mgr to RADOS
- Status changed from New to In Progress
- Assignee set to Jan Fajerski
Updated by Jan Fajerski over 6 years ago
- Subject changed from mgr doesn't get health status change to warn when mon goes down to mon doesn't send health status after paxos service is inactive temporarily
Updated by Jan Fajerski over 6 years ago
- Status changed from In Progress to Fix Under Review
Updated by Kefu Chai over 6 years ago
- Status changed from Fix Under Review to Pending Backport
- Component(RADOS) Monitor added
Updated by Jan Fajerski over 6 years ago
- Copied to Backport #22421: mon doesn't send health status after paxos service is inactive temporarily added
Updated by John Spray over 6 years ago
- Has duplicate Bug #22511: Dashboard showing stale health data added
Updated by John Spray about 6 years ago
- Status changed from Pending Backport to Resolved
Actions