Project

General

Profile

Bug #489

Memory leak when doing a lot of I/O

Added by Wido den Hollander almost 9 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
10/14/2010
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

I have a virtual machine with the following configuration:

<memory>4194304</memory>
  <currentMemory>4194304</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-0.12'>hvm</type>
    <boot dev='hd'/>
    <boot dev='cdrom'/>
  </os>
   .....
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-second'/>
      <target dev='vdb' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-third'/>
      <target dev='vdc' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-fourth'/>
      <target dev='vdd' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-fifth'/>
      <target dev='vde' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-sixth'/>
      <target dev='vdf' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-seventh'/>
      <target dev='vdg' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-eigth'/>
      <target dev='vdh' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-ninth'/>
      <target dev='vdi' bus='virtio'/>
    </disk>
    <disk type='virtual' device='disk'>
      <driver name='qemu' type='rbd' cache='writeback' aio='native'/>
      <source path='rbd:rbd/alpha-tenth'/>
      <target dev='vdj' bus='virtio'/>
    </disk>

There are all disks of 64GB, inside the VM I use LVM to group them to one VG and some LV's.

root@client01:~# rbd info alpha-second
rbd image 'alpha-second':
    size 65536 MB in 16384 objects
    order 22 (4096 KB objects)
root@client01:~# rbd info alpha-tenth
rbd image 'alpha-tenth':
    size 65536 MB in 16384 objects
    order 22 (4096 KB objects)
root@client01:~#

Inside the VM I tried to mirror Ubuntu's ISO's, that's 40GB of data, no problem, while doing so, I had 5 disks attached.

Then I expanded to the ten disks and tried to mirror Debian's cd-images from rsync://mirrors.nl.kernel.org/debian-cd/

While doing so, the memory usage of the qemu process keeps growing up to 70%, then the OOM-killers kicks in and kills the process.

[115005.005783] qemu-system-x86 invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[115005.005790] qemu-system-x86 cpuset=/ mems_allowed=0
[115005.005795] Pid: 21522, comm: qemu-system-x86 Not tainted 2.6.36-rc7-rbd-20410-g47d3df7 #4

As you can see, I'm running 2.6.36-rc7 (master branch) with:

- qemu-kvm-0.12.3
- Latest RBD code (Backported) ( http://zooi.widodh.nl/ceph/qemu-kvm/qemu-kvm_0.12.3+noroms/rbd-support.patch )

I also tried to run the test with a extra attached disk from 500GB, I thought it might be due to the I/O's which were spread out over the disks, but even with one disk I saw the memory growth.

root@alpha:~# df -h /srv/mirror/debian-cd
Filesystem            Size  Used Avail Use% Mounted on
/dev/vdk              493G  3.4G  464G   1% /srv/mirror/debian-cd
root@alpha:~#

Currently the disk to that disk is still running, but the memory growth is also still present, won't take long for the OOM-killer gets invoked.

History

#1 Updated by Yehuda Sadeh almost 9 years ago

This patch doesn't compile (at least on my system). Are you sure you got the latest version running?

#2 Updated by Yehuda Sadeh almost 9 years ago

I do see the memory going up when running on 0.12.3, but not when running with the original version (the one on the rbd branch on our git tree). Could either be a qemu bug or some interaction with the api that has changed.

#3 Updated by Wido den Hollander almost 9 years ago

I'm positive I used the latest version. I just backported qemu-kvm from Ubuntu 10.10 (Maverick) which is Qemu-kvm version 0.12.5, seems to be the same version as the repo is at.

Patch is at: http://zooi.widodh.nl/ceph/qemu-kvm/qemu-kvm_0.12.5+noroms/rbd-support.patch

Right now i'm running the same tests again and that seems to go fine, memory usage is stable at around 51%, which seems fine (100MB more then I allocated the VM)

Weird thing is, I had to modify the RBD code to not use qemu-error.h and change all error_report( functions to printf("%s",

I'm not sure why, but it might be Ubuntu which is backporting/hacking some things.

root@client01:~# qemu-system-x86_64 -version
QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c) 2003-2008 Fabrice Bellard
root@client01:~#

As you can see, I'm at 0.12.5 right now and for now it is working fine.

I've uploaded the backported package to:

deb http://pcx.apt-get.eu/ubuntu lucid-backport unofficial

#4 Updated by Wido den Hollander almost 9 years ago

  • Status changed from New to Closed

I've got two rsync's running right now (Debian CD and kernel.org pub) without any problems at all. Memory usage is stable at 52% which seems fine for now.

Closing this issue, it had to be due to my older Qemu version.

#5 Updated by Sage Weil almost 9 years ago

  • Project changed from 3 to qemu-rbd
  • Category deleted (9)

#6 Updated by Sage Weil about 7 years ago

  • Project changed from qemu-rbd to rbd

Also available in: Atom PDF