Project

General

Profile

Actions

Bug #47929

open

Huge RAM Usage on OSD recovery

Added by Luis Felipe Domínguez Vega over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi, today mi Infra provider has a blackout, then the Ceph was try to
recover but are in an inconsistent state because many OSD can recover
itself because the kernel kill it by OOM. Even now one OSD that was OK,
go down by OOM killed.

Even in a server with 32GB RAM the OSD use ALL that and never recover, i
think that can be a memory leak, ceph version octopus 15.2.3

In: https://pastebin.pl/view/59089adc
You can see that buffer_anon get 32GB, but why?? all my cluster is down
because that.

this https://pastebin.pl/view/59089adc is almost the OSD going to be killed by OOM

pglog trimmed OK, but has the same behavior, log:
https://pastebin.ubuntu.com/p/dwbXtX7wTP/

Actions

Also available in: Atom PDF