Project

General

Profile

Actions

Bug #22142

closed

mon doesn't send health status after paxos service is inactive temporarily

Added by Jan Fajerski over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

To reproduce:

#start a vstart cluster
../src/vstart.sh -n -s -d
#start the prometheus module for health status, the dashboard shows the same info
bin/ceph mgr module enable prometheus
# confirm healthy cluster state
curl 192.168.178.4:9283/metrics | grep "ceph_health_status 0.0" 
# kill a mon and wait a bit for status to change
kill `cat out/mon.a.pid`
sleep 1m
# check of health warn
bin/ceph -s
# mgr modules still show healthy
curl 192.168.178.4:9283/metrics | grep "ceph_health_status 1.0" # 1.0 is warn
curl 192.168.178.4:9283/metrics | grep "ceph_health_status 0.0" # 0.0 is healthy

Alternatively check the mgr dashboard (see screenshot).

It seems like sometimes the status propagates correctly, i.e. the dashboard and prometheus module show the WARN state


Files

2017-11-16-132808_1897x943.png (141 KB) 2017-11-16-132808_1897x943.png mgr dashboard with incorrect health status Jan Fajerski, 11/16/2017 12:28 PM

Related issues 2 (0 open2 closed)

Has duplicate mgr - Bug #22511: Dashboard showing stale health dataDuplicate12/20/2017

Actions
Copied to RADOS - Backport #22421: mon doesn't send health status after paxos service is inactive temporarilyResolvedJan FajerskiActions
Actions #1

Updated by Jan Fajerski over 6 years ago

  • Description updated (diff)
Actions #2

Updated by Jan Fajerski over 6 years ago

  • Description updated (diff)
Actions #3

Updated by Jan Fajerski over 6 years ago

mon/MgrMonitor::send_digests() stops the periodic digests if PaxosService goes inactive for a time (say when a MON goes down). Proposed fix coming up.

Actions #4

Updated by Jan Fajerski over 6 years ago

  • Project changed from mgr to RADOS
  • Status changed from New to In Progress
  • Assignee set to Jan Fajerski
Actions #5

Updated by Jan Fajerski over 6 years ago

  • Subject changed from mgr doesn't get health status change to warn when mon goes down to mon doesn't send health status after paxos service is inactive temporarily
Actions #6

Updated by Jan Fajerski over 6 years ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Sage Weil over 6 years ago

  • Backport set to luminous
Actions #8

Updated by Kefu Chai over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Component(RADOS) Monitor added
Actions #9

Updated by Jan Fajerski over 6 years ago

  • Copied to Backport #22421: mon doesn't send health status after paxos service is inactive temporarily added
Actions #10

Updated by John Spray over 6 years ago

  • Has duplicate Bug #22511: Dashboard showing stale health data added
Actions #11

Updated by John Spray about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF