Bug #13559
FAILED assert(m_ictx->owner_lock.is_locked())
0%
Description
After upgrading from 0.94.1 to 0.94.4 I am having issues starting SOME virtual guests with KVM. The following error is shown in libvirt logs:
librbd/LibrbdWriteback.cc: In function 'virtual ceph_tid_t librbd::LibrbdWriteback::write(const object_t&, const object_locator_t&, uint64_t, uint64_t, const SnapContext&, const bufferlist&, utime_t, uint64_t, __u32, Context*)' thread 7fe016ffd700 time 2015-10-21 13:23:53.002271
librbd/LibrbdWriteback.cc: 160: FAILED assert(m_ictx->owner_lock.is_locked())
ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
1: (()+0x17258b) [0x7fe03e69058b]
2: (()+0xa9573) [0x7fe03e5c7573]
3: (()+0x3a90ca) [0x7fe03e8c70ca]
4: (()+0x3b583d) [0x7fe03e8d383d]
5: (()+0x7212c) [0x7fe03e59012c]
6: (()+0x9590f) [0x7fe03e5b390f]
7: (()+0x969a3) [0x7fe03e5b49a3]
8: (()+0x4782a) [0x7fe03e56582a]
9: (()+0x56599) [0x7fe03e574599]
10: (()+0x7284e) [0x7fe03e59084e]
11: (()+0x162b7e) [0x7fe03e680b7e]
12: (()+0x163c10) [0x7fe03e681c10]
13: (()+0x8182) [0x7fe03a3e2182]
14: (clone()+0x6d) [0x7fe03a10f47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
I am running Apache CloudStack with KVM and Ceph. This issue effects all client host servers and only virtual router guest vms and not anything else. Whenever a virtual router is created or started I have the above error.
I am running Ubuntu 12.04 (kernel 3.19.8-031908-generic ) on osd servers and Ubuntu 14.04 (3.18.22-031822-generic ) on client host servers.
Related issues
Associated revisions
librbd: potential assertion failure during cache read
It's possible for a cache read from a clone to trigger a writeback if a
previous read op determined the object doesn't exist in the clone,
followed by a cached write to the non-existent clone object, followed
by another read request to the same object. This causes the cache to
flush the pending writeback ops while not holding the owner lock.
Fixes: #13559
Backport: hammer
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 4692c330bd992a06b97b5b8975ab71952b22477a)
librbd: potential assertion failure during cache read
It's possible for a cache read from a clone to trigger a writeback if a
previous read op determined the object doesn't exist in the clone,
followed by a cached write to the non-existent clone object, followed
by another read request to the same object. This causes the cache to
flush the pending writeback ops while not holding the owner lock.
Fixes: #13559
Backport: hammer
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
History
#1 Updated by Jason Dillaman over 7 years ago
- Status changed from New to In Progress
- Assignee set to Jason Dillaman
#2 Updated by Jason Dillaman over 7 years ago
- Backport set to hammer
#3 Updated by Jason Dillaman over 7 years ago
- Subject changed from KVM - unable to create some virtual machines to FAILED assert(m_ictx->owner_lock.is_locked())
#4 Updated by Jason Dillaman over 7 years ago
See also #13545
#5 Updated by Andrei Mikhailovsky over 7 years ago
Tried downgrading ceph on host servers to version 0.94.3 as per jdillaman's suggestion on #ceph and the problem is gone. I can successfully start vms.
#6 Updated by Mihai Gheorghe over 7 years ago
Andrei Mikhailovsky wrote:
Tried downgrading ceph on host servers to version 0.94.3 as per jdillaman's suggestion on #ceph and the problem is gone. I can successfully start vms.
How did you manage to downgrade to 0.94.3? I too need to downgrade asap! Thanks
#7 Updated by Edmund Rhudy over 7 years ago
The suggestion of disabling writeback caching with
rbd_cache_max_dirty = 0under the appropriate client sections in ceph.conf worked around the issue for me, though obviously doing so will have performance implications by forcing writethrough caching in all cases.
#8 Updated by Andrei Mikhailovsky over 7 years ago
Mihai, I've downloaded the required ceph packages, libraries, dependencies from ceph repo and installed it manually. Please note that i've only downgraded the client host servers and not the osd/mons as I have been advised against downgrading the osds and mon servers.
Andrei
#9 Updated by Jason Dillaman over 7 years ago
The interim fix packages are available via Ceph development repo (see http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development) under the wip-13559-hammer branch.
#10 Updated by Loïc Dachary over 7 years ago
#11 Updated by Mihai Gheorghe over 7 years ago
Jason Dillaman wrote:
The interim fix packages are available via Ceph development repo (see http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development) under the wip-13559-hammer branch.
Thank you Jason for the patch! It works as expected!
Thank you Andrei also for helping!
#12 Updated by Jason Dillaman over 7 years ago
Updated interim fix for hammer will be available under wip-13567-hammer branch
#13 Updated by Jason Dillaman over 7 years ago
ceph-qa-suite PR: https://github.com/ceph/ceph-qa-suite/pull/652
#14 Updated by Ken Dreyer over 7 years ago
To confirm: this issue only affected v0.94.4, and v0.94.3 is unaffected?
What are the exact commit(s) that introduced this issue?
#15 Updated by Jason Dillaman about 7 years ago
- Status changed from In Progress to Resolved