Project

General

Profile

Bug #22992

mon: add RAM usage (including avail) to HealthMonitor::check_member_health?

Added by Patrick Donnelly about 6 years ago. Updated about 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm looking into several MON_DOWN failures from

http://pulpito.ceph.com/pdonnell-2018-02-13_17:49:41-kcephfs-wip-pdonnell-testing-20180210.023235-testing-basic-smithi/

It's been suggested before that this may be due to memory pressure on the machine since so many daemons are hosted alongside the mons.

It'd be useful to get the memory usage of the mon and the available memory on the system periodically to verify this but also to detect low memory in deployments. I think the natural place to do this is in HealthMonitor::check_member_health where we already check disk space:

2018-02-13 04:53:40.993 7f982eb8c700 10 mon.c@2(peon).health check_member_health avail 99% total 15250 MB, used 110 MB, avail 15139 MB

Thoughts?

History

#1 Updated by Patrick Donnelly about 6 years ago

Turned out it was just the monitor being thrashed (didn't realize we were doing that in kcephfs!): #22993

Still, memory usage checking may be useful!

Also available in: Atom PDF