Project

General

Profile

Feature #9440

mon: log all changes to health in the central log

Added by Greg Farnum over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

It would be awesome if the central log recorded every time the HEALTH status of the cluster changed.

Associated revisions

Revision 7ce770d9 (diff)
Added by Joao Eduardo Luis about 8 years ago

mon: Monitor: health summary to clog on get_health()

Output health summary to clog on Monitor::get_health() (called during,
e.g., 'ceph -s', 'ceph health' and alikes) if 'mon_health_to_clog' is
true (default: false) and if last update is at least
'mon_health_to_clog_interval' old (default: 60.0 (seconds)).

This patch is far from optimal for several reasons though:

1. health summary is still generated on-the-fly by the monitor each time
Monitor::get_health() is called.

2. health summary will only be outputted to clog IF and WHEN
Monitor::get_health() is called.

3. patch does not account for duplicate summaries. We may have the same
string outputted every time Monitor::get_health() is called (as long as
enough time passed since we last wrote to clog)

4. each monitor will output to clog independently from the other
monitors. This means that running a 'ceph -s' 3 times in a row, on a
cluster with at least 3 monitors, may result in writing the same string
3 times.

5. We reduce the amount of writes to clog by caching the last overall
health status. We only write to clog if the overall status is different
from the cached value OR enough time has passed since we last wrote to
clog. This may result in ignoring new contributing factors to overall
cluster health that by themselves do not change the overall status; and
even though we will pick on them once enough time has passed, we may end
up losing intermediate states (which may be good if they're transient,
but not as awesome if they reflect some kind of instability).

Fixes: #9440 (even if in a poor manner)

Signed-off-by: Joao Eduardo Luis <>

History

#1 Updated by Joao Eduardo Luis over 8 years ago

  • Target version set to 0.89

#2 Updated by Samuel Just over 8 years ago

  • Target version changed from 0.89 to v.91

#3 Updated by Samuel Just over 8 years ago

  • Target version changed from v.91 to v.actually90

#4 Updated by Samuel Just over 8 years ago

  • Target version changed from v.actually90 to v.actually91

#5 Updated by Joao Eduardo Luis over 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Joao Eduardo Luis

#6 Updated by Sage Weil over 8 years ago

  • Target version changed from v.actually91 to v0.92

#7 Updated by Sage Weil about 8 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF