Project

General

Profile

Actions

Bug #8912

closed

librbd segfaults when creating new image (rbd-ephemeral-clone-stable-icehouse)

Added by Zollner Robert over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
dumpling, firefly
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Background:
Installed openstack 2014.1.1 on Ubuntu 14.04 ( apt-get update & upgrade)
- glance with ceph
- nova with ceph
- installed without cinder
I can upload images to glance, I can also boot new instances from ceph

Then i delete and replace /usr/lib/python2.7/dist-packages/nova with nova folder from angdraug git repo (branch rbd-ephemeral-clone-stable-icehouse)
Reboot hypervisor
- Instances (backed on ceph ) are starting but cannot launch new instances

nova-compute dies with:
nova-compute18324: segfault at 125 ip 00007fc1502c27d1 sp 00007fc13effcd58 error 4 in librbd.so.1.0.0[7fc150290000+e8000]

Backtrace:
http://paste.openstack.org/show/86443/

Ceph versions:
- default Ubuntu 14.04 ceph version
- 0.80.2
- 0.80.4

Debug:
I have reproduced the issue outside nova-compute here: https://github.com/Lupul/nova-ceph-ephemeral-clone-test
(using a non-existing rbd image but with a existing image there is no problem)
- branch: test-ok never faults
- branch: test-fault always faults

librbd fails here:
/usr/lib/python2.7/dist-packages/rbd.py line 364:
ret = self.librbd.rbd_open_read_only(ioctx.io, c_char_p(name), byref(self.image), c_char_p(snapshot))

Actions #1

Updated by Josh Durgin over 9 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Josh Durgin over 9 years ago

  • Status changed from New to In Progress

Excellent report, your reproducer causes the same crash for me.

Actions #3

Updated by Josh Durgin over 9 years ago

  • Assignee set to Josh Durgin
Actions #4

Updated by Josh Durgin over 9 years ago

  • Backport set to dumpling, firefly

Looks like it was a race condition in a previously little-used error path.

Actions #5

Updated by Josh Durgin over 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Source changed from other to Community (user)
Actions #7

Updated by Sage Weil over 9 years ago

  • Status changed from Pending Backport to Resolved
Actions #8

Updated by Josh Durgin over 9 years ago

For better searchability, the backtrace for this crash is:

#0  ceph::log::SubsystemMap::should_gather (this=0x4ad5170, sub=<optimized out>, level=11) at ./log/SubsystemMap.h:63
#1  0x00007f985659b2db in ObjectCacher::flusher_entry (this=0x65a3da0) at osdc/ObjectCacher.cc:1431
#2  0x00007f98565ab4cd in ObjectCacher::FlusherThread::entry (this=<optimized out>) at osdc/ObjectCacher.h:358
#3  0x00007f98d45fa182 in start_thread (arg=0x7f9855d0f700) at pthread_create.c:312
#4  0x00007f98d432730d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Actions

Also available in: Atom PDF