Project

General

Profile

Actions

Bug #181

closed

monitor eats 8G of memory before beeing oom killed

Added by ar Fred almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi, I installed the latest ceph 0c38b3d63dd24fb8b86283de5e00f260a03d4024, and the latest qemu-rbd e6d8dbce416bfdba88056e5fd53f295e6b5aadf6

did a full restart of the whole cluster, cleaned all rados objects using rados -p rbd rm * in a loop.
(btw, rbdtool was in bad mood rbdtool --delete resulted in: "terminate called after throwing an instance of 'ceph::buffer::end_of_buffer*'\nAborted")

Then I started converting a qemu image to rados using qemu-img convert -O rbd disk0.qcow2 rbd:rbd/testqcow

memory usage of mon0 grew to 8G, (4G RAM + 4G swap), it got killed. Then it seems mon1 took over, the qemu-img command finished, and now that the ceph cluster is idle, the mem usage of mds1 is 232.4M Resident, and mon2 is using 4340K.

Please find the log of mon0 attached.

I did not restart mon0, and mon1, mon2 are left untouched in case it may help...


Files

mon0.log.gz (771 KB) mon0.log.gz ar Fred, 06/06/2010 05:08 AM
Actions #1

Updated by Sage Weil almost 14 years ago

  • Status changed from New to Resolved

fixed two leaks, 21a97d1e7ce329fac07b5e69362d27bb7edb31f5 and d57b629699158abacdcc3880d43111291a6fdf77 (tho the second only bites you if you have the authentication stuff enabled)

Actions #2

Updated by ar Fred almost 14 years ago

Hi, thanks for the fixes.

I just finished testing the new version, and my monitor survived (eating 6.8G memory, more than 1G swap still available...), but the leak is still there.

After the qemu-img command finished, mon0 is still eating more and more memory, after 10 minutes, only 500M swap available, although cluster is idle...

what kind of info do you need?

only 200M swap available now...

Actions #3

Updated by Greg Farnum almost 14 years ago

  • Status changed from Resolved to In Progress
  • Assignee set to Greg Farnum

Guess I'll look at this a bit more.

Actions #4

Updated by ar Fred almost 14 years ago

Ok, here is an update 1 day after posting comment #2:

mon0 is dead, mon1 is also dead, both OOM-killed I guess (no core files). mon2 now uses 696.7M resident memory, although the ceph cluster was inactive after my initial qemu-img convert.

Greg, in one of my mail about "OSD memory usage" I mentioned "quite a lot of disk activity when the system is idle (by processes jbd2/sda1-8, cmon)", it may be related to the current issue.

thanks, Fred

Actions #5

Updated by Greg Farnum almost 14 years ago

  • Status changed from In Progress to Resolved

Found and fixed many more monitor memory leaks in 7c85646240a02a3e82a727045de6e4432cc2ed9e. Valgrind is a lot happier with it now (went from losing >50MB to <10k in very short instances). You should be good now.

Actions

Also available in: Atom PDF