Project

General

Profile

Actions

Bug #41076

open

librbd only close rbd image after snapshot iteration is finished

Added by Sofia Enriquez over 4 years ago. Updated over 4 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
librbd
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On Ceph mimic iterating over the snapshots of closed image results in the following assertion being triggered:

cinder-backup29383: /build/ceph-13.2.2/src/common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f35a4812700 time 2018-12-14 10:10:49.843120
cinder-backup29383: /build/ceph-13.2.2/src/common/Mutex.cc: 110: FAILED assert(r == 0)
cinder-backup29383: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
cinder-backup29383: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f35903440f2]
cinder-backup29383: 2: (()+0x3162b7) [0x7f35903442b7]
cinder-backup29383: 3: (Mutex::Lock(bool)+0x1de) [0x7f359031901e]
cinder-backup29383: 4: (()+0x8f452) [0x7f358dafc452]
cinder-backup29383: 5: (()+0x11280d) [0x7f358db7f80d]
cinder-backup29383: 6: (rbd_snap_get_namespace_type()+0x29) [0x7f358dacf549]

This Librbd new behavior was faced during the incremental backup creation on OpenStack. Is this okay?

How to reproduce:

import rbd
import rados
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
cluster.connect()
ioctx = cluster.open_ioctx('backups')
image = rbd.Image(ioctx, 'volume-2d2c563d-c567-431a-9b25-ea06af83c598.backup.base')
snaps = image.list_snaps()
image.close()
list(snaps)

Aborted (core dumped)

Expected output: snapshots list

list(snaps)
[{'id': 4, 'size': 1073741824, 'name': 'backup.0dbaab38-595e-4e8a-8ecd-7ab60a879903.snap.1563912753.3520913', 'namespace': 0}, {'id': 5, 'size': 1073741824, 'name': 'backup.c9aaaf88-9ce1-4502-aa23-1dddfb1d45bb.snap.1564609930.8021717', 'namespace': 0}]

Actual output: core dumped

Upstream OpenStack bug: https://bugs.launchpad.net/cinder/+bug/1838691

Thanks

Actions #1

Updated by Jason Dillaman over 4 years ago

  • Status changed from New to Need More Info

This sounds like a case if you purposely shooting yourself in the foot (closing the image and then attempting to reference it again). I really don't see the need to add protection against this type of case. Would we also need to protect against you running "image.close()" and then executing any other method of the "Image" class? If you want to keep it around until later, change the iterable object to a list by wrapping the result w/ "snaps = list(image.list_snaps())"

Actions #2

Updated by Sofia Enriquez over 4 years ago

Jason Dillaman wrote:

This sounds like a case if you purposely shooting yourself in the foot (closing the image and then attempting to reference it again). I really don't see the need to add protection against this type of case. Would we also need to protect against you running "image.close()" and then executing any other method of the "Image" class? If you want to keep it around until later, change the iterable object to a list by wrapping the result w/ "snaps = list(image.list_snaps())"

Hi Jason Dillaman, thank you for your quick reply. I wonder why this worked before upgrading to mimic.

Actions #3

Updated by Mykola Golub over 4 years ago

I wonder why this worked before upgrading to mimic.

`image.list_snaps()` returns an iterator. But internally, it reads the images snapshots only once on the iterator initialization, calling `rbd_snap_list` api function, and keeping the list of snapshots in the memory. When the iterator is called, before mimic, it just returned the next item from this in-memory list. In mimic, snapshot namespace support was added, so when the iterator is called, it needs to do additional call to `image` to get namespace info for the snapshot before returning the result.

So, it could be made to work as it worked before, by preparing all necessary data on the iterator initialization, but I agree with Jason that this looks rather the openstack issue, and it worked previously just by accident. Note, in some future we might want to optimize the iterator not to reed all snaps at once but only by a chunk.

Actions #4

Updated by Sofia Enriquez over 4 years ago

Mykola Golub wrote:

I wonder why this worked before upgrading to mimic.

`image.list_snaps()` returns an iterator. But internally, it reads the images snapshots only once on the iterator initialization, calling `rbd_snap_list` api function, and keeping the list of snapshots in the memory. When the iterator is called, before mimic, it just returned the next item from this in-memory list. In mimic, snapshot namespace support was added, so when the iterator is called, it needs to do additional call to `image` to get namespace info for the snapshot before returning the result.

So, it could be made to work as it worked before, by preparing all necessary data on the iterator initialization, but I agree with Jason that this looks rather the openstack issue, and it worked previously just by accident. Note, in some future we might want to optimize the iterator not to reed all snaps at once but only by a chunk.

Make sense, thanks Jason and Mykola.

Actions

Also available in: Atom PDF