Project

General

Profile

Bug #4739

Failed assert in librbd with rbd cache enabled

Added by Mike Kelly almost 8 years ago. Updated almost 8 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

With librbd, as used by qemu (via libvirt), while using rsync to sync data to a fresh image:

osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t, int)' thread 7facbe188700 time 2013-04-16 14:39:15.886397
osdc/ObjectCacher.cc: 847: FAILED assert(ob->last_commit_tid < tid)
ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long, int)+0xd68) [0x7faccd56c688]
2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x7faccd573bfb]
3: (Context::complete(int)+0xa) [0x7faccd52d53a]
4: (librbd::C_Request::finish(int)+0x85) [0x7faccd55b025]
5: (Context::complete(int)+0xa) [0x7faccd52d53a]
6: (librbd::rados_req_cb(void*, void*)+0x47) [0x7faccd541257]
7: (librados::C_AioSafe::finish(int)+0x1d) [0x7faccc8dbc5d]
8: (Finisher::finisher_thread_entry()+0x1c0) [0x7faccc948160]
9: (()+0x7e9a) [0x7facca1afe9a]
10: (clone()+0x6d) [0x7facc9edbcbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'

This failed assertion causes the entire qemu process to terminate.

Disabling the rbd cache seems to be a workaround, though it makes things much too slow.

  1. ceph -v
    ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
  1. kvm -version
    QEMU emulator version 1.0 (qemu-kvm-1.0), Copyright (c) 2003-2008 Fabrice Bellard
  1. libvirtd --version
    libvirtd (libvirt) 0.9.8
  1. uname -a
    Linux vps1 3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
  1. lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description: Ubuntu 12.04.2 LTS
    Release: 12.04
    Codename: precise

This is using the 'ceph-testing' branch apt repo for ceph, and the standard Ubuntu 12.04 repos for libvirt & qemu.


Related issues

Duplicates rbd - Bug #4531: ObjectCacher: read waiters for parent data during copyup get reordered, causing the write order assert to fail Resolved 03/22/2013

History

#1 Updated by Sage Weil almost 8 years ago

  • Priority changed from Normal to Urgent

#2 Updated by Sage Weil almost 8 years ago

How easy is this to reproduce? We have fixed several causes of this behavior, but I cant' remember offhand if they were all included in v0.60.

#3 Updated by Josh Durgin almost 8 years ago

The latest cause of this was #4531, whose fix was just merged yesterday. If this is reproducible, could you try using librbd from the next branch in the development testing packages?

#4 Updated by Mike Kelly almost 8 years ago

Josh Durgin wrote:

The latest cause of this was #4531, whose fix was just merged yesterday. If this is reproducible, could you try using librbd from the next branch in the development testing packages?

That could be us, too... this is a clone.

I can try again with a fresh format 2 image (not cloned from anything). If it still fails, we can try out a development branch, but otherwise it sounds kinda like #4531.

#5 Updated by Mike Kelly almost 8 years ago

Mike Kelly wrote:

Josh Durgin wrote:

The latest cause of this was #4531, whose fix was just merged yesterday. If this is reproducible, could you try using librbd from the next branch in the development testing packages?

That could be us, too... this is a clone.

I can try again with a fresh format 2 image (not cloned from anything). If it still fails, we can try out a development branch, but otherwise it sounds kinda like #4531.

Yes, using a "fresh" image, instead of a clone, this didn't issue didn't happen again, but before, when trying to sync to the same cloned image, the issue continued to happen. So, it sounds like this is related to #4531, and is hopefully now fixed. I guess this can be closed, and I'll reopen if I manage to reproduce it again.

#6 Updated by Sage Weil almost 8 years ago

  • Status changed from New to Duplicate

see #4531

#7 Updated by Sage Weil almost 8 years ago

  • Project changed from Ceph to rbd

Also available in: Atom PDF