Project

General

Profile

Actions

Bug #8878

closed

mds lock cycle (wip-objecter)

Added by Sage Weil almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

------------------------------------
existing dependency OSDSession (20) -> OSDSession::completion_lock (21) at:
 ceph version 0.82-412-gc342a55 (c342a55a87b48ada81b284823e34efc03a2dc4ca)
 1: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x128d) [0x7c6e3d]
 2: (MDS::handle_core_message(Message*)+0x5a0) [0x59d250]
 3: (MDS::_dispatch(Message*)+0x2f) [0x59d9df]
 4: (MDS::ms_dispatch(Message*)+0x1ec) [0x59f49c]
 5: (DispatchQueue::entry()+0x4e9) [0x9ce179]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x8df6cd]
 7: (()+0x7e9a) [0x7f415cd2be9a]
 8: (clone()+0x6d) [0x7f415b73d3fd]

2014-07-18 19:41:43.672622 7f4158c2a700  0 existing intermediate dependency Objecter::rwlock (18) -> OSDSession (20) at:
 ceph version 0.82-412-gc342a55 (c342a55a87b48ada81b284823e34efc03a2dc4ca)
 1: (Objecter::_scan_requests(Objecter::OSDSession*, bool, bool, std::map<unsigned long, Objecter::Op*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::Op*> > >&, std::list<Objecter::LingerOp*, std::allocator<Objecter::LingerOp*> >&, std::map<unsigned long, Objecter::CommandOp*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::CommandOp*> > >&)+0x7a) [0x7c403a]
 2: (Objecter::handle_osd_map(MOSDMap*)+0x2061) [0x7cc261]
 3: (MDS::handle_core_message(Message*)+0xb70) [0x59d820]
 4: (MDS::_dispatch(Message*)+0x2f) [0x59d9df]
 5: (MDS::ms_dispatch(Message*)+0x1ec) [0x59f49c]
 6: (DispatchQueue::entry()+0x4e9) [0x9ce179]
 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8df6cd]
 8: (()+0x7e9a) [0x7f415cd2be9a]
 9: (clone()+0x6d) [0x7f415b73d3fd]

2014-07-18 19:41:43.672676 7f4158c2a700  0 new dependency OSDSession::completion_lock (21) -> Objecter::rwlock (18) creates a cycle at
 ceph version 0.82-412-gc342a55 (c342a55a87b48ada81b284823e34efc03a2dc4ca)
 1: (Objecter::write_trunc(object_t const&, object_locator_t const&, unsigned long, unsigned long, SnapContext const&, ceph::buffer::list const&, utime_t, int, unsigned long, unsigned int, Context*, Context*, unsigned long*, ObjectOperation*)+0x168) [0x7b0018]
 2: (Objecter::sg_write_trunc(std::vector<ObjectExtent, std::allocator<ObjectExtent> >&, SnapContext const&, ceph::buffer::list const&, utime_t, int, unsigned long, unsigned int, Context*, Context*)+0xc97) [0x7b3337]
 3: (Journaler::_do_flush(unsigned int)+0x2fd) [0x7abfbd]
 4: (Journaler::_prezeroed(int, unsigned long, unsigned long)+0xe3) [0x7aed73]
 5: (Context::complete(int)+0x9) [0x59f529]
 6: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x12bc) [0x7c6e6c]
 7: (MDS::handle_core_message(Message*)+0x5a0) [0x59d250]
 8: (MDS::_dispatch(Message*)+0x2f) [0x59d9df]
 9: (MDS::ms_dispatch(Message*)+0x1ec) [0x59f49c]
 10: (DispatchQueue::entry()+0x4e9) [0x9ce179]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x8df6cd]
 12: (()+0x7e9a) [0x7f415cd2be9a]
 13: (clone()+0x6d) [0x7f415b73d3fd]

2014-07-18 19:41:43.672711 7f4158c2a700  0 btw, i am holding these locks:
2014-07-18 19:41:43.672713 7f4158c2a700  0   MDS::mds_lock (13)
2014-07-18 19:41:43.672714 7f4158c2a700  0   OSDSession::completion_lock (21)
2014-07-18 19:41:43.672716 7f4158c2a700  0 

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-07-18_09:53:15-rados-wip-objecter-testing-basic-plana/369363$ zless remote/*/log/*mds.a*
Actions #1

Updated by Sage Weil over 9 years ago

  • Assignee changed from Yehuda Sadeh to Sage Weil
Actions #2

Updated by Sage Weil over 9 years ago

This is going to be a bit of a project:

- fix every completion to take mds_lock
- .. and shunt every one off to a Finisher
- figure out how to make this behave with Journaler (uh oh)

Also, I think this will affect the use of ObjectCacher in Client, too.

Actions #3

Updated by John Spray over 9 years ago

  • Status changed from New to In Progress
  • Assignee changed from Sage Weil to John Spray

I think all these are OK now in wip-mds-contexts: remaining failures on that branch are all outside MDS.

Actions #4

Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF