Backport #22421: mon doesn't send health status after paxos service is inactive temporarily - RADOS - Ceph

Backport #22421

Updated by Jan Fajerski over 6 years ago

To reproduce: 

 <pre><code class="text"> 
 #start a vstart cluster 
 ../src/vstart.sh -n -s -d 
 #start the prometheus module for health status, the dashboard shows the same info 
 bin/ceph mgr module enable prometheus 
 # confirm healthy cluster state 
 curl 192.168.178.4:9283/metrics | grep "ceph_health_status 0.0" 
 # kill a mon and wait a bit for status to change 
 kill `cat out/mon.a.pid` 
 sleep 1m 
 # check of health warn 
 bin/ceph -s 
 # mgr modules still show healthy 
 curl 192.168.178.4:9283/metrics | grep "ceph_health_status 1.0" # 1.0 is warn 
 curl 192.168.178.4:9283/metrics | grep "ceph_health_status 0.0" # 0.0 is healthy 
 </code></pre> 

 Alternatively check the mgr dashboard (see screenshot). 

 It seems like sometimes the status propagates correctly, i.e. the dashboard and prometheus module show the WARN state 

 PR: https://github.com/ceph/ceph/pull/19481

Back

Project

General

Profile

Ceph » RADOS

Backport #22421