Bug #8912
librbd segfaults when creating new image (rbd-ephemeral-clone-stable-icehouse)
0%
Description
Background:
Installed openstack 2014.1.1 on Ubuntu 14.04 ( apt-get update & upgrade)
- glance with ceph
- nova with ceph
- installed without cinder
I can upload images to glance, I can also boot new instances from ceph
Then i delete and replace /usr/lib/python2.7/dist-packages/nova with nova folder from angdraug git repo (branch rbd-ephemeral-clone-stable-icehouse)
Reboot hypervisor
- Instances (backed on ceph ) are starting but cannot launch new instances
nova-compute dies with:
nova-compute18324: segfault at 125 ip 00007fc1502c27d1 sp 00007fc13effcd58 error 4 in librbd.so.1.0.0[7fc150290000+e8000]
Backtrace:
http://paste.openstack.org/show/86443/
Ceph versions:
- default Ubuntu 14.04 ceph version
- 0.80.2
- 0.80.4
Debug:
I have reproduced the issue outside nova-compute here: https://github.com/Lupul/nova-ceph-ephemeral-clone-test
(using a non-existing rbd image but with a existing image there is no problem)
- branch: test-ok never faults
- branch: test-fault always faults
librbd fails here:
/usr/lib/python2.7/dist-packages/rbd.py line 364:
ret = self.librbd.rbd_open_read_only(ioctx.io, c_char_p(name), byref(self.image), c_char_p(snapshot))
Associated revisions
librbd: fix error path cleanup for opening an image
If the image doesn't exist and caching is enabled, the ObjectCacher
was not being shutdown, and the ImageCtx was leaked. The IoCtx could
later be closed while the ObjectCacher was still running, resulting in
a segfault. Simply use the usual cleanup path in open_image(), which
works fine here.
Fixes: #8912
Backport: dumpling, firefly
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
librbd: fix error path cleanup for opening an image
If the image doesn't exist and caching is enabled, the ObjectCacher
was not being shutdown, and the ImageCtx was leaked. The IoCtx could
later be closed while the ObjectCacher was still running, resulting in
a segfault. Simply use the usual cleanup path in open_image(), which
works fine here.
Fixes: #8912
Backport: dumpling, firefly
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 3dfa72d5b9a1f54934dc8289592556d30430959d)
librbd: fix error path cleanup for opening an image
If the image doesn't exist and caching is enabled, the ObjectCacher
was not being shutdown, and the ImageCtx was leaked. The IoCtx could
later be closed while the ObjectCacher was still running, resulting in
a segfault. Simply use the usual cleanup path in open_image(), which
works fine here.
Fixes: #8912
Backport: dumpling, firefly
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 3dfa72d5b9a1f54934dc8289592556d30430959d)
History
#1 Updated by Josh Durgin over 9 years ago
- Priority changed from Normal to Urgent
#2 Updated by Josh Durgin over 9 years ago
- Status changed from New to In Progress
Excellent report, your reproducer causes the same crash for me.
#3 Updated by Josh Durgin over 9 years ago
- Assignee set to Josh Durgin
#4 Updated by Josh Durgin over 9 years ago
- Backport set to dumpling, firefly
Looks like it was a race condition in a previously little-used error path.
#5 Updated by Josh Durgin over 9 years ago
- Status changed from In Progress to Fix Under Review
#6 Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Pending Backport
- Source changed from other to Community (user)
#7 Updated by Sage Weil over 9 years ago
- Status changed from Pending Backport to Resolved
#8 Updated by Josh Durgin over 9 years ago
For better searchability, the backtrace for this crash is:
#0 ceph::log::SubsystemMap::should_gather (this=0x4ad5170, sub=<optimized out>, level=11) at ./log/SubsystemMap.h:63 #1 0x00007f985659b2db in ObjectCacher::flusher_entry (this=0x65a3da0) at osdc/ObjectCacher.cc:1431 #2 0x00007f98565ab4cd in ObjectCacher::FlusherThread::entry (this=<optimized out>) at osdc/ObjectCacher.h:358 #3 0x00007f98d45fa182 in start_thread (arg=0x7f9855d0f700) at pthread_create.c:312 #4 0x00007f98d432730d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111