Project

General

Profile

Actions

Bug #50681

open

memstore: apparent memory leak when removing objects

Added by Sven Anderson almost 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When I create and unlink big files like in this1 little program in my development environment, the OSD daemon keeps claiming more and more memory (using the memstore backend), eventually resulting in an OOM kill. If I limit the memory with "osd memory target" and disable the cache, it just blocks when the memory is used up. If I change to the filestore backend, the memory leak is gone. Although memstore is not meant for production use, it is an issue, when it is used for benchmarking other ceph-related code.

This is my ceph.conf:

[global]
fsid = $(uuidgen)
osd crush chooseleaf type = 0
run dir = ${DIR}/run
auth cluster required = none
auth service required = none
auth client required = none
osd pool default size = 1
mon host = ${HOSTNAME}

[mds.${MDS_NAME}]
host = ${HOSTNAME}

[mon.${MON_NAME}]
log file = ${LOG_DIR}/mon.log
chdir = "" 
mon cluster log file = ${LOG_DIR}/mon-cluster.log
mon data = ${MON_DATA}
mon data avail crit = 0
mon addr = ${HOSTNAME}
mon allow pool delete = true

[osd.0]
log file = ${LOG_DIR}/osd.log
chdir = "" 
osd data = ${OSD_DATA}
osd journal = ${OSD_DATA}.journal
osd journal size = 100
osd objectstore = memstore
osd class load list = *
osd class default list = *
osd_max_object_name_len = 256

[1] https://paste.ee/p/fUmYX


Files

ceph.tar.bz2 (224 KB) ceph.tar.bz2 ceph test cluster data Sven Anderson, 05/21/2021 03:07 PM
Actions #1

Updated by Sven Anderson almost 3 years ago

The title should say "osd objectstore = memstore"

Actions #2

Updated by Greg Farnum almost 3 years ago

  • Subject changed from Memory leak when creating and unlinking files with osd objectstore = filestore to Memory leak when creating and unlinking files with osd objectstore = memstore

I’m not totally clear on what you’re doing here and what you think the erroneous behavior is. Memstore only stores data in memory, so of course storing more uses up the memory.

File deleted are not processed instantaneously, but neither are files being automatically snapshotted. The mds has to do background deletes of the relevant objects when a client performs an unlink, but it can’t do that until the client drops all the capabilities for the file in question.

My guess is that you have a mount which is maintaining caps on the files because you’re not generating enough files to push them out of its LRU list, and not waiting for it to decide you’ve lost interest in the files in question.

Actions #3

Updated by Sven Anderson almost 3 years ago

Thanks Greg for your answer. So my expectation was, that at least when there is memory pressure or I am unmounting the cephfs, that the memory from the unlinked files is either returned to the system, or that it is reused for the next run of the benchmark. Did you notice the code snipped, that I linked here: https://paste.ee/p/fUmYX ? That's all I am running. After each run, the RSS of the osd daemon is 2.5GB larger. Since I'm unmounting, I assume all caps are dropped as well. Can I manually trigger the GC in the mds to check if that would solve the issue?

Actions #4

Updated by Patrick Donnelly almost 3 years ago

  • Project changed from CephFS to RADOS
  • Category changed from Performance/Resource Usage to Performance/Resource Usage
Actions #5

Updated by Greg Farnum almost 3 years ago

  • Project changed from RADOS to CephFS
  • Category changed from Performance/Resource Usage to Performance/Resource Usage

How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool object info?

I really think you're just seeing the impact of the background file deletion from the MDS. Not sure how to manually trigger it; I think it just runs at what it considers an appropriate rate.

Also, it's memstore: there may be tunings that don't work well on OSDs of this size which we aren't going to fuss over.

Actions #6

Updated by Loïc Dachary almost 3 years ago

  • Target version deleted (v15.2.11)
Actions #7

Updated by Sven Anderson almost 3 years ago

Greg Farnum wrote:

How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool object info?

For hours. I did look at logs, but I guess I can't interpret if there is something unusual. Please check out the attached files. I also added some command dumps in the out/ subdirectory.

I really think you're just seeing the impact of the background file deletion from the MDS. Not sure how to manually trigger it; I think it just runs at what it considers an appropriate rate.

Also, it's memstore: there may be tunings that don't work well on OSDs of this size which we aren't going to fuss over.

I also tried 4MB files. Same effect.

Actions #8

Updated by Sven Anderson almost 3 years ago

The ceph-osd had a RES memory footprint of 2.6GB while I created above files.

Actions #9

Updated by Greg Farnum almost 3 years ago

  • Project changed from CephFS to RADOS
  • Subject changed from Memory leak when creating and unlinking files with osd objectstore = memstore to memstore: apparent memory leak when removing objects
  • Category changed from Performance/Resource Usage to Performance/Resource Usage
  • Component(RADOS) OSD added

Sven Anderson wrote:

Greg Farnum wrote:

How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool object info?

For hours. I did look at logs, but I guess I can't interpret if there is something unusual. Please check out the attached files. I also added some command dumps in the out/ subdirectory.

Okay, well, the pg dump does say there are only like 6 MB of data in RADOS, so that's pretty good evidence it's an issue in memstore.

Thanks for the report and the logs!

Actions

Also available in: Atom PDF