Bug #37388: rgw: memory leak with multisite sync - rgw - Ceph

Actions

Copy link

Bug #37388

open

rgw: memory leak with multisite sync

Added by Dieter Roels over 5 years ago. Updated over 5 years ago.

Status:

In Progress

Priority:

Normal

Assignee:

Mark Kogan

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v13.2.2

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Ever since we started to use ceph with multisite sync (around the 12.2.2 time I think) we noticed that the rgw memory footprint keeps slowly growing. This resulted in an OOM kill every few days/weeks. We tested with more memory in the VMs but even with 32GB memory the OOM's occured, so we settled for VMs with 8GB memory, and had OOm's about once a week. Clients did not really notice because of the loadbalancers.

Recently we tested the mimic release, and noticed the leak is substantially worse in mimic. Our current test evnvironment is running 13.2.2, has no objects, so no client connections other then the multisite sync. The rgws get OOM kills about once a day on 8GB VMs. We tested with 32GB VMs and they show the same memory growth, but they last for a few days before OOM.

So, my question is, how can we test this memory leak? I did run it with valgrind once, but it throws lots of errors and seems not to be compatible with jemalloc. And it seems rgw does not keep memory statistics like the other daemons?

Probably usefull info: we run civetweb with ssl on rhel

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #37388

rgw: memory leak with multisite sync

Updated by Casey Bodley over 5 years ago

Updated by Casey Bodley over 5 years ago

Updated by Casey Bodley over 5 years ago

Updated by Dieter Roels over 5 years ago

Updated by Matt Benjamin over 5 years ago