Bug #41076
openlibrbd only close rbd image after snapshot iteration is finished
0%
Description
On Ceph mimic iterating over the snapshots of closed image results in the following assertion being triggered:
cinder-backup29383: /build/ceph-13.2.2/src/common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f35a4812700 time 2018-12-14 10:10:49.843120
cinder-backup29383: /build/ceph-13.2.2/src/common/Mutex.cc: 110: FAILED assert(r == 0)
cinder-backup29383: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
cinder-backup29383: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f35903440f2]
cinder-backup29383: 2: (()+0x3162b7) [0x7f35903442b7]
cinder-backup29383: 3: (Mutex::Lock(bool)+0x1de) [0x7f359031901e]
cinder-backup29383: 4: (()+0x8f452) [0x7f358dafc452]
cinder-backup29383: 5: (()+0x11280d) [0x7f358db7f80d]
cinder-backup29383: 6: (rbd_snap_get_namespace_type()+0x29) [0x7f358dacf549]
This Librbd new behavior was faced during the incremental backup creation on OpenStack. Is this okay?
How to reproduce:
import rbd
import rados
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
cluster.connect()
ioctx = cluster.open_ioctx('backups')
image = rbd.Image(ioctx, 'volume-2d2c563d-c567-431a-9b25-ea06af83c598.backup.base')
snaps = image.list_snaps()
image.close()
list(snaps)
Aborted (core dumped)
Expected output: snapshots list
list(snaps)
[{'id': 4, 'size': 1073741824, 'name': 'backup.0dbaab38-595e-4e8a-8ecd-7ab60a879903.snap.1563912753.3520913', 'namespace': 0}, {'id': 5, 'size': 1073741824, 'name': 'backup.c9aaaf88-9ce1-4502-aa23-1dddfb1d45bb.snap.1564609930.8021717', 'namespace': 0}]
Actual output: core dumped
Upstream OpenStack bug: https://bugs.launchpad.net/cinder/+bug/1838691
Thanks
Updated by Jason Dillaman over 4 years ago
- Status changed from New to Need More Info
This sounds like a case if you purposely shooting yourself in the foot (closing the image and then attempting to reference it again). I really don't see the need to add protection against this type of case. Would we also need to protect against you running "image.close()" and then executing any other method of the "Image" class? If you want to keep it around until later, change the iterable object to a list by wrapping the result w/ "snaps = list(image.list_snaps())"
Updated by Sofia Enriquez over 4 years ago
Jason Dillaman wrote:
This sounds like a case if you purposely shooting yourself in the foot (closing the image and then attempting to reference it again). I really don't see the need to add protection against this type of case. Would we also need to protect against you running "image.close()" and then executing any other method of the "Image" class? If you want to keep it around until later, change the iterable object to a list by wrapping the result w/ "snaps = list(image.list_snaps())"
Hi Jason Dillaman, thank you for your quick reply. I wonder why this worked before upgrading to mimic.
Updated by Mykola Golub over 4 years ago
I wonder why this worked before upgrading to mimic.
`image.list_snaps()` returns an iterator. But internally, it reads the images snapshots only once on the iterator initialization, calling `rbd_snap_list` api function, and keeping the list of snapshots in the memory. When the iterator is called, before mimic, it just returned the next item from this in-memory list. In mimic, snapshot namespace support was added, so when the iterator is called, it needs to do additional call to `image` to get namespace info for the snapshot before returning the result.
So, it could be made to work as it worked before, by preparing all necessary data on the iterator initialization, but I agree with Jason that this looks rather the openstack issue, and it worked previously just by accident. Note, in some future we might want to optimize the iterator not to reed all snaps at once but only by a chunk.
Updated by Sofia Enriquez over 4 years ago
Mykola Golub wrote:
I wonder why this worked before upgrading to mimic.
`image.list_snaps()` returns an iterator. But internally, it reads the images snapshots only once on the iterator initialization, calling `rbd_snap_list` api function, and keeping the list of snapshots in the memory. When the iterator is called, before mimic, it just returned the next item from this in-memory list. In mimic, snapshot namespace support was added, so when the iterator is called, it needs to do additional call to `image` to get namespace info for the snapshot before returning the result.
So, it could be made to work as it worked before, by preparing all necessary data on the iterator initialization, but I agree with Jason that this looks rather the openstack issue, and it worked previously just by accident. Note, in some future we might want to optimize the iterator not to reed all snaps at once but only by a chunk.
Make sense, thanks Jason and Mykola.