Bug #13559
closedFAILED assert(m_ictx->owner_lock.is_locked())
0%
Description
After upgrading from 0.94.1 to 0.94.4 I am having issues starting SOME virtual guests with KVM. The following error is shown in libvirt logs:
librbd/LibrbdWriteback.cc: In function 'virtual ceph_tid_t librbd::LibrbdWriteback::write(const object_t&, const object_locator_t&, uint64_t, uint64_t, const SnapContext&, const bufferlist&, utime_t, uint64_t, __u32, Context*)' thread 7fe016ffd700 time 2015-10-21 13:23:53.002271
librbd/LibrbdWriteback.cc: 160: FAILED assert(m_ictx->owner_lock.is_locked())
ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
1: (()+0x17258b) [0x7fe03e69058b]
2: (()+0xa9573) [0x7fe03e5c7573]
3: (()+0x3a90ca) [0x7fe03e8c70ca]
4: (()+0x3b583d) [0x7fe03e8d383d]
5: (()+0x7212c) [0x7fe03e59012c]
6: (()+0x9590f) [0x7fe03e5b390f]
7: (()+0x969a3) [0x7fe03e5b49a3]
8: (()+0x4782a) [0x7fe03e56582a]
9: (()+0x56599) [0x7fe03e574599]
10: (()+0x7284e) [0x7fe03e59084e]
11: (()+0x162b7e) [0x7fe03e680b7e]
12: (()+0x163c10) [0x7fe03e681c10]
13: (()+0x8182) [0x7fe03a3e2182]
14: (clone()+0x6d) [0x7fe03a10f47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
I am running Apache CloudStack with KVM and Ceph. This issue effects all client host servers and only virtual router guest vms and not anything else. Whenever a virtual router is created or started I have the above error.
I am running Ubuntu 12.04 (kernel 3.19.8-031908-generic ) on osd servers and Ubuntu 14.04 (3.18.22-031822-generic ) on client host servers.
Files
Updated by Jason Dillaman over 8 years ago
- Status changed from New to In Progress
- Assignee set to Jason Dillaman
Updated by Jason Dillaman over 8 years ago
- Subject changed from KVM - unable to create some virtual machines to FAILED assert(m_ictx->owner_lock.is_locked())
Updated by Andrei Mikhailovsky over 8 years ago
Tried downgrading ceph on host servers to version 0.94.3 as per jdillaman's suggestion on #ceph and the problem is gone. I can successfully start vms.
Updated by Mihai Gheorghe over 8 years ago
Andrei Mikhailovsky wrote:
Tried downgrading ceph on host servers to version 0.94.3 as per jdillaman's suggestion on #ceph and the problem is gone. I can successfully start vms.
How did you manage to downgrade to 0.94.3? I too need to downgrade asap! Thanks
Updated by Edmund Rhudy over 8 years ago
The suggestion of disabling writeback caching with
rbd_cache_max_dirty = 0under the appropriate client sections in ceph.conf worked around the issue for me, though obviously doing so will have performance implications by forcing writethrough caching in all cases.
Updated by Andrei Mikhailovsky over 8 years ago
Mihai, I've downloaded the required ceph packages, libraries, dependencies from ceph repo and installed it manually. Please note that i've only downgraded the client host servers and not the osd/mons as I have been advised against downgrading the osds and mon servers.
Andrei
Updated by Jason Dillaman over 8 years ago
The interim fix packages are available via Ceph development repo (see http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development) under the wip-13559-hammer branch.
Updated by Loïc Dachary over 8 years ago
Updated by Mihai Gheorghe over 8 years ago
Jason Dillaman wrote:
The interim fix packages are available via Ceph development repo (see http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development) under the wip-13559-hammer branch.
Thank you Jason for the patch! It works as expected!
Thank you Andrei also for helping!
Updated by Jason Dillaman over 8 years ago
Updated interim fix for hammer will be available under wip-13567-hammer branch
Updated by Jason Dillaman over 8 years ago
ceph-qa-suite PR: https://github.com/ceph/ceph-qa-suite/pull/652
Updated by Ken Dreyer over 8 years ago
To confirm: this issue only affected v0.94.4, and v0.94.3 is unaffected?
What are the exact commit(s) that introduced this issue?
Updated by Jason Dillaman over 8 years ago
- Status changed from In Progress to Resolved