Project

General

Profile

Bug #10229

Filer: lock inversion with Objecter

Added by Greg Farnum over 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
Start date:
12/03/2014
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

Saw this on a next test (http://qa-proxy.ceph.com/teuthology/sage-2014-12-01_11:11:17-fs-next-distro-basic-multi/628949/teuthology.log)

------------------------------------
existing dependency OSDSession (25) -> OSDSession::completion_lock (26) at:
 ceph version 0.88-234-g310a3f1 (310a3f1c65d25ee1fc60398f0fe025e4b19f55fe)
 1: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x12a8) [0x8083f8]
 2: (Objecter::ms_dispatch(Message*)+0x20b) [0x81480b]
 3: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0xa135f7]
 4: (DispatchQueue::entry()+0x44a) [0xa23e9a]
 5: (DispatchQueue::DispatchThread::entry()+0xd) [0x924ecd]
 6: (()+0x7e9a) [0x7f8ab5d6ee9a]
 7: (clone()+0x6d) [0x7f8ab471dccd]

    -5> 2014-12-01 16:42:37.412717 7f8ab0c8e700  0 existing intermediate dependency Objecter::rwlock (23) -> OSDSession (25) at:
 ceph version 0.88-234-g310a3f1 (310a3f1c65d25ee1fc60398f0fe025e4b19f55fe)
 1: (Objecter::_scan_requests(Objecter::OSDSession*, bool, bool, std::map<unsigned long, Objecter::Op*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::Op*> > >&, std::list<Objecter::LingerOp*, std::allocator<Objecter::LingerOp*> >&, std::map<unsigned long, Objecter::CommandOp*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::CommandOp*> > >&)+0x99) [0x8057d9]
 2: (Objecter::handle_osd_map(MOSDMap*)+0x1f67) [0x80fa17]
 3: (Objecter::ms_dispatch(Message*)+0x257) [0x814857]
 4: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0xa135f7]
 5: (DispatchQueue::entry()+0x44a) [0xa23e9a]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x924ecd]
 7: (()+0x7e9a) [0x7f8ab5d6ee9a]
 8: (clone()+0x6d) [0x7f8ab471dccd]

    -4> 2014-12-01 16:42:37.412751 7f8ab0c8e700  0 new dependency OSDSession::completion_lock (26) -> Objecter::rwlock (23) creates a cycle at
 ceph version 0.88-234-g310a3f1 (310a3f1c65d25ee1fc60398f0fe025e4b19f55fe)
 1: (Context::complete(int)+0x9) [0x5b79a9]
 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x12dc) [0x80842c]
 3: (Objecter::ms_dispatch(Message*)+0x20b) [0x81480b]
 4: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0xa135f7]
 5: (DispatchQueue::entry()+0x44a) [0xa23e9a]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x924ecd]
 7: (()+0x7e9a) [0x7f8ab5d6ee9a]
 8: (clone()+0x6d) [0x7f8ab471dccd]

    -3> 2014-12-01 16:42:37.412764 7f8ab0c8e700  0 btw, i am holding these locks:
    -2> 2014-12-01 16:42:37.412765 7f8ab0c8e700  0   OSDSession::completion_lock (26)
    -1> 2014-12-01 16:42:37.412767 7f8ab0c8e700  0 

     0> 2014-12-01 16:42:37.414541 7f8ab0c8e700 -1 common/lockdep.cc: In function 'int lockdep_will_lock(const char*, int)' thread 7f8ab0c8e700 time 2014-12-01 16:42:37.412774
common/lockdep.cc: 225: FAILED assert(0)

 ceph version 0.88-234-g310a3f1 (310a3f1c65d25ee1fc60398f0fe025e4b19f55fe)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x93b5df]
 2: (lockdep_will_lock(char const*, int)+0xc50) [0x9be3e0]
 3: (Filer::_do_purge_range(PurgeRange*, int)+0x7a2) [0x7dbe12]
 4: (Context::complete(int)+0x9) [0x5b79a9]
 5: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x12dc) [0x80842c]
 6: (Objecter::ms_dispatch(Message*)+0x20b) [0x81480b]
 7: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0xa135f7]
 8: (DispatchQueue::entry()+0x44a) [0xa23e9a]
 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x924ecd]
 10: (()+0x7e9a) [0x7f8ab5d6ee9a]
 11: (clone()+0x6d) [0x7f8ab471dccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I'm not sure if this is something in the MDS or if the Objecter interface is a little broken, since it looks like we're somehow not allowed to grab osdmaps in Objecter completion contexts? Or maybe we're just not on a Finisher queue and should be.

Associated revisions

Revision d3ee89ac (diff)
Added by Yan, Zheng over 4 years ago

osdc/Filer: use finisher to execute C_Probe and C_PurgeRange

Currently contexts C_Probe/C_PurgeRange are executed while holding
OSDSession::completion_lock. C_Probe and C_PurgeRange may call
Objecter::stat() and Objecter::remove() respectively, which acquire
Objecter::rwlock. This can cause deadlock because there is intermediate
dependency between Objecter::rwlock and OSDSession::completion_lock:

Objecter::rwlock -> OSDSession::lock -> OSDSession::completion_lock

The fix is exexcute C_Probe/C_PurgeRange in finisher thread.

Fixes: #10229
Signed-off-by: Yan, Zheng <>

Revision 7bbf80ff (diff)
Added by Yan, Zheng over 4 years ago

osdc/Filer: use finisher to execute C_Probe and C_PurgeRange

Currently contexts C_Probe/C_PurgeRange are executed while holding
OSDSession::completion_lock. C_Probe and C_PurgeRange may call
Objecter::stat() and Objecter::remove() respectively, which acquire
Objecter::rwlock. This can cause deadlock because there is intermediate
dependency between Objecter::rwlock and OSDSession::completion_lock:

Objecter::rwlock -> OSDSession::lock -> OSDSession::completion_lock

The fix is exexcute C_Probe/C_PurgeRange in finisher thread.

Fixes: #10229
Signed-off-by: Yan, Zheng <>
(cherry picked from commit d3ee89ace660161df7796affbf9a70f3d0dedce1)

History

#1 Updated by Zheng Yan over 4 years ago

  • Status changed from New to Testing

#2 Updated by Zheng Yan over 4 years ago

  • Status changed from Testing to Resolved

#3 Updated by Greg Farnum about 3 years ago

  • Component(FS) MDS added

Also available in: Atom PDF