Add memory consumption of nodes to health checks
During some tests using a (very small) virtual cluster I noticed that Ceph doesn't seem to 'notice' when a node runs out of available memory (including swap). The virtual node where this happened was an OSD so the result was a large amount of slow ops and stalled operations.
At least in Ubuntu it's possible to get the current memory consumption of a system including swap with "free -m" which seems to report a fairly accurate reading. The command reports the same values when used inside a container. My idea is that Ceph monitors the current system memory load of all nodes in regular intervals. If the amount of free memory on a node falls below a (user) defined threshold or the the swap file gets too large - for whatever reason which could also be a different process - the cluster health changes to a warning state. If a Health Warn is too much the cluster alternatively could log the problem.
In normal clusters with very large amount of RAM available per node this check might seem a little bit unneccessary but another data point might be helpful.