Project

General

Profile

Feature #48430

Add memory consumption of nodes to health checks

Added by Gunther Heinrich about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

During some tests using a (very small) virtual cluster I noticed that Ceph doesn't seem to 'notice' when a node runs out of available memory (including swap). The virtual node where this happened was an OSD so the result was a large amount of slow ops and stalled operations.
At least in Ubuntu it's possible to get the current memory consumption of a system including swap with "free -m" which seems to report a fairly accurate reading. The command reports the same values when used inside a container. My idea is that Ceph monitors the current system memory load of all nodes in regular intervals. If the amount of free memory on a node falls below a (user) defined threshold or the the swap file gets too large - for whatever reason which could also be a different process - the cluster health changes to a warning state. If a Health Warn is too much the cluster alternatively could log the problem.
In normal clusters with very large amount of RAM available per node this check might seem a little bit unneccessary but another data point might be helpful.

Also available in: Atom PDF