Bug #4739
closedFailed assert in librbd with rbd cache enabled
0%
Description
With librbd, as used by qemu (via libvirt), while using rsync to sync data to a fresh image:
osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t, int)' thread 7facbe188700 time 2013-04-16 14:39:15.886397
osdc/ObjectCacher.cc: 847: FAILED assert(ob->last_commit_tid < tid)
ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long, int)+0xd68) [0x7faccd56c688]
2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x7faccd573bfb]
3: (Context::complete(int)+0xa) [0x7faccd52d53a]
4: (librbd::C_Request::finish(int)+0x85) [0x7faccd55b025]
5: (Context::complete(int)+0xa) [0x7faccd52d53a]
6: (librbd::rados_req_cb(void*, void*)+0x47) [0x7faccd541257]
7: (librados::C_AioSafe::finish(int)+0x1d) [0x7faccc8dbc5d]
8: (Finisher::finisher_thread_entry()+0x1c0) [0x7faccc948160]
9: (()+0x7e9a) [0x7facca1afe9a]
10: (clone()+0x6d) [0x7facc9edbcbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
This failed assertion causes the entire qemu process to terminate.
Disabling the rbd cache seems to be a workaround, though it makes things much too slow.
- ceph -v
ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
- kvm -version
QEMU emulator version 1.0 (qemu-kvm-1.0), Copyright (c) 2003-2008 Fabrice Bellard
- libvirtd --version
libvirtd (libvirt) 0.9.8
- uname -a
Linux vps1 3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
- lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise
This is using the 'ceph-testing' branch apt repo for ceph, and the standard Ubuntu 12.04 repos for libvirt & qemu.
Updated by Sage Weil about 11 years ago
How easy is this to reproduce? We have fixed several causes of this behavior, but I cant' remember offhand if they were all included in v0.60.
Updated by Josh Durgin about 11 years ago
The latest cause of this was #4531, whose fix was just merged yesterday. If this is reproducible, could you try using librbd from the next branch in the development testing packages?
Updated by Mike Kelly about 11 years ago
Josh Durgin wrote:
The latest cause of this was #4531, whose fix was just merged yesterday. If this is reproducible, could you try using librbd from the next branch in the development testing packages?
That could be us, too... this is a clone.
I can try again with a fresh format 2 image (not cloned from anything). If it still fails, we can try out a development branch, but otherwise it sounds kinda like #4531.
Updated by Mike Kelly about 11 years ago
Mike Kelly wrote:
Josh Durgin wrote:
The latest cause of this was #4531, whose fix was just merged yesterday. If this is reproducible, could you try using librbd from the next branch in the development testing packages?
That could be us, too... this is a clone.
I can try again with a fresh format 2 image (not cloned from anything). If it still fails, we can try out a development branch, but otherwise it sounds kinda like #4531.
Yes, using a "fresh" image, instead of a clone, this didn't issue didn't happen again, but before, when trying to sync to the same cloned image, the issue continued to happen. So, it sounds like this is related to #4531, and is hopefully now fixed. I guess this can be closed, and I'll reopen if I manage to reproduce it again.