Project

General

Profile

Actions

Feature #48430

open

Add memory consumption of nodes to health checks

Added by Gunther Heinrich over 3 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

During some tests using a (very small) virtual cluster I noticed that Ceph doesn't seem to 'notice' when a node runs out of available memory (including swap). The virtual node where this happened was an OSD so the result was a large amount of slow ops and stalled operations.
At least in Ubuntu it's possible to get the current memory consumption of a system including swap with "free -m" which seems to report a fairly accurate reading. The command reports the same values when used inside a container. My idea is that Ceph monitors the current system memory load of all nodes in regular intervals. If the amount of free memory on a node falls below a (user) defined threshold or the the swap file gets too large - for whatever reason which could also be a different process - the cluster health changes to a warning state. If a Health Warn is too much the cluster alternatively could log the problem.
In normal clusters with very large amount of RAM available per node this check might seem a little bit unneccessary but another data point might be helpful.

Actions #1

Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to RADOS
Actions #2

Updated by Laura Flores almost 2 years ago

  • Tags set to low-hanging-fruit
Actions #3

Updated by Laura Flores almost 2 years ago

  • Translation missing: en.field_tag_list set to low-hanging-fruit
  • Tags deleted (low-hanging-fruit)
Actions

Also available in: Atom PDF