Project

General

Profile

Bug #13559

FAILED assert(m_ictx->owner_lock.is_locked())

Added by Andrei Mikhailovsky about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Target version:
-
Start date:
10/21/2015
Due date:
% Done:

0%

Source:
other
Tags:
kvm, cloudstack, rbd
Backport:
hammer
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:

Description

After upgrading from 0.94.1 to 0.94.4 I am having issues starting SOME virtual guests with KVM. The following error is shown in libvirt logs:

librbd/LibrbdWriteback.cc: In function 'virtual ceph_tid_t librbd::LibrbdWriteback::write(const object_t&, const object_locator_t&, uint64_t, uint64_t, const SnapContext&, const bufferlist&, utime_t, uint64_t, __u32, Context*)' thread 7fe016ffd700 time 2015-10-21 13:23:53.002271
librbd/LibrbdWriteback.cc: 160: FAILED assert(m_ictx->owner_lock.is_locked())
ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
1: (()+0x17258b) [0x7fe03e69058b]
2: (()+0xa9573) [0x7fe03e5c7573]
3: (()+0x3a90ca) [0x7fe03e8c70ca]
4: (()+0x3b583d) [0x7fe03e8d383d]
5: (()+0x7212c) [0x7fe03e59012c]
6: (()+0x9590f) [0x7fe03e5b390f]
7: (()+0x969a3) [0x7fe03e5b49a3]
8: (()+0x4782a) [0x7fe03e56582a]
9: (()+0x56599) [0x7fe03e574599]
10: (()+0x7284e) [0x7fe03e59084e]
11: (()+0x162b7e) [0x7fe03e680b7e]
12: (()+0x163c10) [0x7fe03e681c10]
13: (()+0x8182) [0x7fe03a3e2182]
14: (clone()+0x6d) [0x7fe03a10f47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'

I am running Apache CloudStack with KVM and Ceph. This issue effects all client host servers and only virtual router guest vms and not anything else. Whenever a virtual router is created or started I have the above error.

I am running Ubuntu 12.04 (kernel 3.19.8-031908-generic ) on osd servers and Ubuntu 14.04 (3.18.22-031822-generic ) on client host servers.

ceph-report.txt.zip (926 KB) Andrei Mikhailovsky, 10/21/2015 01:15 PM


Related issues

Copied to rbd - Backport #13567: FAILED assert(m_ictx->owner_lock.is_locked()) Resolved

Associated revisions

Revision d3abcbea (diff)
Added by Jason Dillaman about 3 years ago

librbd: potential assertion failure during cache read

It's possible for a cache read from a clone to trigger a writeback if a
previous read op determined the object doesn't exist in the clone,
followed by a cached write to the non-existent clone object, followed
by another read request to the same object. This causes the cache to
flush the pending writeback ops while not holding the owner lock.

Fixes: #13559
Backport: hammer
Signed-off-by: Jason Dillaman <>
(cherry picked from commit 4692c330bd992a06b97b5b8975ab71952b22477a)

Revision 58414c56 (diff)
Added by Jason Dillaman about 3 years ago

librbd: potential assertion failure during cache read

It's possible for a cache read from a clone to trigger a writeback if a
previous read op determined the object doesn't exist in the clone,
followed by a cached write to the non-existent clone object, followed
by another read request to the same object. This causes the cache to
flush the pending writeback ops while not holding the owner lock.

Fixes: #13559
Backport: hammer
Signed-off-by: Jason Dillaman <>

History

#1 Updated by Jason Dillaman about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman

#2 Updated by Jason Dillaman about 3 years ago

  • Backport set to hammer

#3 Updated by Jason Dillaman about 3 years ago

  • Subject changed from KVM - unable to create some virtual machines to FAILED assert(m_ictx->owner_lock.is_locked())

#4 Updated by Jason Dillaman about 3 years ago

See also #13545

#5 Updated by Andrei Mikhailovsky about 3 years ago

Tried downgrading ceph on host servers to version 0.94.3 as per jdillaman's suggestion on #ceph and the problem is gone. I can successfully start vms.

#6 Updated by Mihai Gheorghe about 3 years ago

Andrei Mikhailovsky wrote:

Tried downgrading ceph on host servers to version 0.94.3 as per jdillaman's suggestion on #ceph and the problem is gone. I can successfully start vms.

How did you manage to downgrade to 0.94.3? I too need to downgrade asap! Thanks

#7 Updated by Edmund Rhudy about 3 years ago

The suggestion of disabling writeback caching with

rbd_cache_max_dirty = 0
under the appropriate client sections in ceph.conf worked around the issue for me, though obviously doing so will have performance implications by forcing writethrough caching in all cases.

#8 Updated by Andrei Mikhailovsky about 3 years ago

Mihai, I've downloaded the required ceph packages, libraries, dependencies from ceph repo and installed it manually. Please note that i've only downgraded the client host servers and not the osd/mons as I have been advised against downgrading the osds and mon servers.

Andrei

#9 Updated by Jason Dillaman about 3 years ago

The interim fix packages are available via Ceph development repo (see http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development) under the wip-13559-hammer branch.

#11 Updated by Mihai Gheorghe about 3 years ago

Jason Dillaman wrote:

The interim fix packages are available via Ceph development repo (see http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development) under the wip-13559-hammer branch.

Thank you Jason for the patch! It works as expected!

Thank you Andrei also for helping!

#12 Updated by Jason Dillaman about 3 years ago

Updated interim fix for hammer will be available under wip-13567-hammer branch

#14 Updated by Ken Dreyer almost 3 years ago

To confirm: this issue only affected v0.94.4, and v0.94.3 is unaffected?

What are the exact commit(s) that introduced this issue?

#15 Updated by Jason Dillaman almost 3 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF