Project

General

Profile

Bug #8912

librbd segfaults when creating new image (rbd-ephemeral-clone-stable-icehouse)

Added by Zollner Robert over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
dumpling, firefly
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Background:
Installed openstack 2014.1.1 on Ubuntu 14.04 ( apt-get update & upgrade)
- glance with ceph
- nova with ceph
- installed without cinder
I can upload images to glance, I can also boot new instances from ceph

Then i delete and replace /usr/lib/python2.7/dist-packages/nova with nova folder from angdraug git repo (branch rbd-ephemeral-clone-stable-icehouse)
Reboot hypervisor
- Instances (backed on ceph ) are starting but cannot launch new instances

nova-compute dies with:
nova-compute18324: segfault at 125 ip 00007fc1502c27d1 sp 00007fc13effcd58 error 4 in librbd.so.1.0.0[7fc150290000+e8000]

Backtrace:
http://paste.openstack.org/show/86443/

Ceph versions:
- default Ubuntu 14.04 ceph version
- 0.80.2
- 0.80.4

Debug:
I have reproduced the issue outside nova-compute here: https://github.com/Lupul/nova-ceph-ephemeral-clone-test
(using a non-existing rbd image but with a existing image there is no problem)
- branch: test-ok never faults
- branch: test-fault always faults

librbd fails here:
/usr/lib/python2.7/dist-packages/rbd.py line 364:
ret = self.librbd.rbd_open_read_only(ioctx.io, c_char_p(name), byref(self.image), c_char_p(snapshot))

Associated revisions

Revision 3dfa72d5 (diff)
Added by Josh Durgin over 9 years ago

librbd: fix error path cleanup for opening an image

If the image doesn't exist and caching is enabled, the ObjectCacher
was not being shutdown, and the ImageCtx was leaked. The IoCtx could
later be closed while the ObjectCacher was still running, resulting in
a segfault. Simply use the usual cleanup path in open_image(), which
works fine here.

Fixes: #8912
Backport: dumpling, firefly
Signed-off-by: Josh Durgin <>

Revision e767254c (diff)
Added by Josh Durgin over 9 years ago

librbd: fix error path cleanup for opening an image

If the image doesn't exist and caching is enabled, the ObjectCacher
was not being shutdown, and the ImageCtx was leaked. The IoCtx could
later be closed while the ObjectCacher was still running, resulting in
a segfault. Simply use the usual cleanup path in open_image(), which
works fine here.

Fixes: #8912
Backport: dumpling, firefly
Signed-off-by: Josh Durgin <>
(cherry picked from commit 3dfa72d5b9a1f54934dc8289592556d30430959d)

Revision cbc9218e (diff)
Added by Josh Durgin over 9 years ago

librbd: fix error path cleanup for opening an image

If the image doesn't exist and caching is enabled, the ObjectCacher
was not being shutdown, and the ImageCtx was leaked. The IoCtx could
later be closed while the ObjectCacher was still running, resulting in
a segfault. Simply use the usual cleanup path in open_image(), which
works fine here.

Fixes: #8912
Backport: dumpling, firefly
Signed-off-by: Josh Durgin <>
(cherry picked from commit 3dfa72d5b9a1f54934dc8289592556d30430959d)

History

#1 Updated by Josh Durgin over 9 years ago

  • Priority changed from Normal to Urgent

#2 Updated by Josh Durgin over 9 years ago

  • Status changed from New to In Progress

Excellent report, your reproducer causes the same crash for me.

#3 Updated by Josh Durgin over 9 years ago

  • Assignee set to Josh Durgin

#4 Updated by Josh Durgin over 9 years ago

  • Backport set to dumpling, firefly

Looks like it was a race condition in a previously little-used error path.

#5 Updated by Josh Durgin over 9 years ago

  • Status changed from In Progress to Fix Under Review

#6 Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Source changed from other to Community (user)

#7 Updated by Sage Weil over 9 years ago

  • Status changed from Pending Backport to Resolved

#8 Updated by Josh Durgin over 9 years ago

For better searchability, the backtrace for this crash is:

#0  ceph::log::SubsystemMap::should_gather (this=0x4ad5170, sub=<optimized out>, level=11) at ./log/SubsystemMap.h:63
#1  0x00007f985659b2db in ObjectCacher::flusher_entry (this=0x65a3da0) at osdc/ObjectCacher.cc:1431
#2  0x00007f98565ab4cd in ObjectCacher::FlusherThread::entry (this=<optimized out>) at osdc/ObjectCacher.h:358
#3  0x00007f98d45fa182 in start_thread (arg=0x7f9855d0f700) at pthread_create.c:312
#4  0x00007f98d432730d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Also available in: Atom PDF