Project

General

Profile

Actions

Bug #9562

closed

Lockdep assertion in Filer purge

Added by John Spray over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Actions #1

Updated by John Spray over 9 years ago

  • Subject changed from lockdep to Lockdep assertion in Filer purge
me 2014-09-20 09:56:34.623470
2014-09-20T09:56:34.627 INFO:tasks.ceph.mds.a.plana58.stderr:common/lockdep.cc: 225: FAILED assert(0)
2014-09-20T09:56:34.628 INFO:tasks.ceph.mds.a.plana58.stderr: ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
2014-09-20T09:56:34.628 INFO:tasks.ceph.mds.a.plana58.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x914cef]
2014-09-20T09:56:34.628 INFO:tasks.ceph.mds.a.plana58.stderr: 2: (lockdep_will_lock(char const*, int)+0xc50) [0x98fe30]
2014-09-20T09:56:34.629 INFO:tasks.ceph.mds.a.plana58.stderr: 3: (Mutex::Lock(bool)+0x104) [0x8cf664]
2014-09-20T09:56:34.629 INFO:tasks.ceph.mds.a.plana58.stderr: 4: (Filer::_do_purge_range(PurgeRange*, int)+0x38) [0x7b2f38]
2014-09-20T09:56:34.629 INFO:tasks.ceph.mds.a.plana58.stderr: 5: (Context::complete(int)+0x9) [0x5a0ce9]
2014-09-20T09:56:34.629 INFO:tasks.ceph.mds.a.plana58.stderr: 6: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1211) [0x7de021]
2014-09-20T09:56:34.630 INFO:tasks.ceph.mds.a.plana58.stderr: 7: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b]
2014-09-20T09:56:34.630 INFO:tasks.ceph.mds.a.plana58.stderr: 8: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219]
2014-09-20T09:56:34.630 INFO:tasks.ceph.mds.a.plana58.stderr: 9: (Pipe::reader()+0x268c) [0xa1f0bc]
2014-09-20T09:56:34.630 INFO:tasks.ceph.mds.a.plana58.stderr: 10: (Pipe::Reader::entry()+0xd) [0xa20d7d]
2014-09-20T09:56:34.631 INFO:tasks.ceph.mds.a.plana58.stderr: 11: (()+0x7e9a) [0x7f71c2618e9a]
2014-09-20T09:56:34.631 INFO:tasks.ceph.mds.a.plana58.stderr: 12: (clone()+0x6d) [0x7f71c11ce31d]
2014-09-20T09:56:34.631 INFO:tasks.ceph.mds.a.plana58.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-09-20T09:56:34.631 INFO:tasks.ceph.mds.a.plana58.stderr:2014-09-20 09:56:34.626634 7f71b751e700 -1 common/lockdep.cc: In function 'int lockdep_will_lock(const char*, int)' thread 7f71b751e700 time 2014-09-20 09:56:34.623470
2014-09-20T09:56:34.632 INFO:tasks.ceph.mds.a.plana58.stderr:common/lockdep.cc: 225: FAILED assert(0)

http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-19_23:04:01-fs-giant-testing-basic-multi/500082/teuthology.log

Actions #2

Updated by John Spray over 9 years ago

  • Assignee set to John Spray
Actions #3

Updated by John Spray over 9 years ago

So I think this bug already existed with the Probe lock, but it was triggered by the new PurgeRange lock, because the purge happens during normal operation whereas the probe only happens on recovery. We only get a recovery event during the lockdep tests once the purge bug hits on the active mds and the standby tries to recover (probing in the process).

Anyway, the specific cycle it's complaining about is:

existing dependency OSDSession (25) -> OSDSession::completion_lock (26) at:
 ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
 1: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11dd) [0x7ddfed]
 2: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b]
 3: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219]
 4: (Pipe::reader()+0x268c) [0xa1f0bc]
 5: (Pipe::Reader::entry()+0xd) [0xa20d7d]
 6: (()+0x7e9a) [0x7f97577ade9a]
 7: (clone()+0x6d) [0x7f97563623fd]

    -6> 2014-09-22 11:08:19.514967 7f974c8b4700  0 existing intermediate dependency Objecter::rwlock (23) -> OSDSession (25) at:
 ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
 1: (Objecter::_scan_requests(Objecter::OSDSession*, bool, bool, std::map<unsigned long, Objecter::Op*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::Op*> > >&, std::list<Objecter::LingerOp*, std::allocator<Objecter::LingerOp*> >&, std::map<unsigned long, Objecter::CommandOp*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::CommandOp*> > >&)+0x99) [0x7db499]
 2: (Objecter::handle_osd_map(MOSDMap*)+0x1ecb) [0x7e35ab]
 3: (Objecter::ms_dispatch(Message*)+0x257) [0x7e8487]
 4: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x9ed827]
 5: (DispatchQueue::entry()+0x44a) [0x9ea9ca]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x8fafad]
 7: (()+0x7e9a) [0x7f97577ade9a]
 8: (clone()+0x6d) [0x7f97563623fd]

    -5> 2014-09-22 11:08:19.515001 7f974c8b4700  0 existing intermediate dependency Filer::PurgeRange (43) -> Objecter::rwlock (23) at:
 ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
 1: (Filer::purge_range(inodeno_t, ceph_file_layout*, SnapContext const&, unsigned long, unsigned long, utime_t, int, Context*)+0x1d3) [0x7b3ab3]
 2: (MDCache::purge_stray(CDentry*)+0x1606) [0x659696]
 3: (MDCache::eval_stray(CDentry*, bool)+0xa20) [0x65a950]
 4: (MDCache::trim(int, int)+0xef) [0x65bfbf]
 5: (MDS::tick()+0x3b3) [0x58a2e3]
 6: (MDSInternalContextBase::complete(int)+0x1db) [0x79ff7b]
 7: (SafeTimer::timer_thread()+0x3cd) [0x904b5d]
 8: (SafeTimerThread::entry()+0xd) [0x90775d]
 9: (()+0x7e9a) [0x7f97577ade9a]
 10: (clone()+0x6d) [0x7f97563623fd]

    -4> 2014-09-22 11:08:19.515018 7f974c8b4700  0 new dependency OSDSession::completion_lock (26) -> Filer::PurgeRange (43) creates a cycle at
 ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
 1: (Filer::_do_purge_range(PurgeRange*, int)+0x38) [0x7b2f38]
 2: (Context::complete(int)+0x9) [0x5a0ce9]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1211) [0x7de021]
 4: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b]
 5: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219]
 6: (Pipe::reader()+0x268c) [0xa1f0bc]
 7: (Pipe::Reader::entry()+0xd) [0xa20d7d]
 8: (()+0x7e9a) [0x7f97577ade9a]
 9: (clone()+0x6d) [0x7f97563623fd]
existing dependency OSDSession (25) -> OSDSession::completion_lock (26) at:
 ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
 1: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11dd) [0x7ddfed]
 2: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b]
 3: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219]
 4: (Pipe::reader()+0x268c) [0xa1f0bc]
 5: (Pipe::Reader::entry()+0xd) [0xa20d7d]
 6: (()+0x7e9a) [0x7f2d02ab1e9a]
 7: (clone()+0x6d) [0x7f2d016663fd]

    -6> 2014-09-22 11:08:38.060127 7f2cf74b8700  0 existing intermediate dependency Objecter::rwlock (23) -> OSDSession (25) at:
 ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
 1: (Objecter::_scan_requests(Objecter::OSDSession*, bool, bool, std::map<unsigned long, Objecter::Op*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::Op*> > >&, std::list<Objecter::LingerOp*, std::allocator<Objecter::LingerOp*> >&, std::map<unsigned long, Objecter::CommandOp*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::CommandOp*> > >&)+0x99) [0x7db499]
 2: (Objecter::handle_osd_map(MOSDMap*)+0x1ecb) [0x7e35ab]
 3: (Objecter::ms_dispatch(Message*)+0x257) [0x7e8487]
 4: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x9ed827]
 5: (DispatchQueue::entry()+0x44a) [0x9ea9ca]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x8fafad]
 7: (()+0x7e9a) [0x7f2d02ab1e9a]
 8: (clone()+0x6d) [0x7f2d016663fd]

    -5> 2014-09-22 11:08:38.060158 7f2cf74b8700  0 existing intermediate dependency Filer::Probe (43) -> Objecter::rwlock (23) at:
 ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
 1: (Filer::_probe(Filer::Probe*)+0x560) [0x7b4330]
 2: (Filer::probe(inodeno_t, ceph_file_layout*, snapid_t, unsigned long, unsigned long*, utime_t*, bool, int, Context*)+0x22c) [0x7b4bcc]
 3: (Journaler::_probe(Context*, unsigned long*)+0xf8) [0x7b90a8]
 4: (Journaler::_finish_read_head(int, ceph::buffer::list&)+0x443) [0x7bac53]
 5: (Context::complete(int)+0x9) [0x5a0ce9]
 6: (Finisher::finisher_thread_entry()+0x160) [0x852810]
 7: (()+0x7e9a) [0x7f2d02ab1e9a]
 8: (clone()+0x6d) [0x7f2d016663fd]

    -4> 2014-09-22 11:08:38.060171 7f2cf74b8700  0 new dependency OSDSession::completion_lock (26) -> Filer::Probe (43) creates a cycle at
 ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661)
 1: (Filer::C_Probe::finish(int)+0x99) [0x7b76c9]
 2: (Context::complete(int)+0x9) [0x5a0ce9]
 3: (Objecter::C_Stat::finish(int)+0xc0) [0x7b6370]
 4: (Context::complete(int)+0x9) [0x5a0ce9]
 5: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11ff) [0x7de00f]
 6: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b]
 7: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219]
 8: (Pipe::reader()+0x268c) [0xa1f0bc]
 9: (Pipe::Reader::entry()+0xd) [0xa20d7d]
 10: (()+0x7e9a) [0x7f2d02ab1e9a]
 11: (clone()+0x6d) [0x7f2d016663fd]

    -3> 2014-09-22 11:08:38.060191 7f2cf74b8700  0 btw, i am holding these locks:
    -2> 2014-09-22 11:08:38.060192 7f2cf74b8700  0   OSDSession::completion_lock (26)
    -1> 2014-09-22 11:08:38.060193 7f2cf74b8700  0

We probably need a finisher here, but it's also worth checking that the use of rwlock here is really unsafe: if both are just holding the read lock then it might be a false alarm.

Actions #4

Updated by John Spray over 9 years ago

  • Status changed from New to In Progress
Actions #5

Updated by John Spray over 9 years ago

  • Project changed from Ceph to CephFS
Actions #6

Updated by Zheng Yan over 9 years ago

can we just unlock the PurgeRange/Probe locks before using the objecter?

Actions #7

Updated by John Spray over 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #8

Updated by Zheng Yan over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions #9

Updated by Greg Farnum over 9 years ago

  • Status changed from Resolved to Pending Backport
  • Priority changed from Normal to Urgent

This is popping up in Giant as well, which I believe has the new code that was the proximate cause. :)

Actions #10

Updated by John Spray over 9 years ago

  • Status changed from Pending Backport to Resolved

Backported to giant:

commit a8ac4b62a5e878c5658fda730a2f429297ba7c05
Author: John Spray <john.spray@redhat.com>
Date:   Wed Sep 24 14:19:32 2014 +0100

    osdc/Filer: drop probe/purge locks before calling objecter

    Fixes: #9562

    Signed-off-by: John Spray <john.spray@redhat.com>
    (cherry picked from commit 8dc94a2d8ce3364c0d8d52f634e5fc967e43d819)
Actions

Also available in: Atom PDF