Bug #9562
Lockdep assertion in Filer purge
0%
Associated revisions
osdc/Filer: drop probe/purge locks before calling objecter
Fixes: #9562
Signed-off-by: John Spray <john.spray@redhat.com>
osdc/Filer: drop probe/purge locks before calling objecter
Fixes: #9562
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 8dc94a2d8ce3364c0d8d52f634e5fc967e43d819)
History
#1 Updated by John Spray almost 9 years ago
- Subject changed from lockdep to Lockdep assertion in Filer purge
me 2014-09-20 09:56:34.623470 2014-09-20T09:56:34.627 INFO:tasks.ceph.mds.a.plana58.stderr:common/lockdep.cc: 225: FAILED assert(0) 2014-09-20T09:56:34.628 INFO:tasks.ceph.mds.a.plana58.stderr: ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 2014-09-20T09:56:34.628 INFO:tasks.ceph.mds.a.plana58.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x914cef] 2014-09-20T09:56:34.628 INFO:tasks.ceph.mds.a.plana58.stderr: 2: (lockdep_will_lock(char const*, int)+0xc50) [0x98fe30] 2014-09-20T09:56:34.629 INFO:tasks.ceph.mds.a.plana58.stderr: 3: (Mutex::Lock(bool)+0x104) [0x8cf664] 2014-09-20T09:56:34.629 INFO:tasks.ceph.mds.a.plana58.stderr: 4: (Filer::_do_purge_range(PurgeRange*, int)+0x38) [0x7b2f38] 2014-09-20T09:56:34.629 INFO:tasks.ceph.mds.a.plana58.stderr: 5: (Context::complete(int)+0x9) [0x5a0ce9] 2014-09-20T09:56:34.629 INFO:tasks.ceph.mds.a.plana58.stderr: 6: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1211) [0x7de021] 2014-09-20T09:56:34.630 INFO:tasks.ceph.mds.a.plana58.stderr: 7: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b] 2014-09-20T09:56:34.630 INFO:tasks.ceph.mds.a.plana58.stderr: 8: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219] 2014-09-20T09:56:34.630 INFO:tasks.ceph.mds.a.plana58.stderr: 9: (Pipe::reader()+0x268c) [0xa1f0bc] 2014-09-20T09:56:34.630 INFO:tasks.ceph.mds.a.plana58.stderr: 10: (Pipe::Reader::entry()+0xd) [0xa20d7d] 2014-09-20T09:56:34.631 INFO:tasks.ceph.mds.a.plana58.stderr: 11: (()+0x7e9a) [0x7f71c2618e9a] 2014-09-20T09:56:34.631 INFO:tasks.ceph.mds.a.plana58.stderr: 12: (clone()+0x6d) [0x7f71c11ce31d] 2014-09-20T09:56:34.631 INFO:tasks.ceph.mds.a.plana58.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2014-09-20T09:56:34.631 INFO:tasks.ceph.mds.a.plana58.stderr:2014-09-20 09:56:34.626634 7f71b751e700 -1 common/lockdep.cc: In function 'int lockdep_will_lock(const char*, int)' thread 7f71b751e700 time 2014-09-20 09:56:34.623470 2014-09-20T09:56:34.632 INFO:tasks.ceph.mds.a.plana58.stderr:common/lockdep.cc: 225: FAILED assert(0)
#2 Updated by John Spray almost 9 years ago
- Assignee set to John Spray
#3 Updated by John Spray almost 9 years ago
So I think this bug already existed with the Probe lock, but it was triggered by the new PurgeRange lock, because the purge happens during normal operation whereas the probe only happens on recovery. We only get a recovery event during the lockdep tests once the purge bug hits on the active mds and the standby tries to recover (probing in the process).
Anyway, the specific cycle it's complaining about is:
existing dependency OSDSession (25) -> OSDSession::completion_lock (26) at: ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 1: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11dd) [0x7ddfed] 2: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b] 3: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219] 4: (Pipe::reader()+0x268c) [0xa1f0bc] 5: (Pipe::Reader::entry()+0xd) [0xa20d7d] 6: (()+0x7e9a) [0x7f97577ade9a] 7: (clone()+0x6d) [0x7f97563623fd] -6> 2014-09-22 11:08:19.514967 7f974c8b4700 0 existing intermediate dependency Objecter::rwlock (23) -> OSDSession (25) at: ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 1: (Objecter::_scan_requests(Objecter::OSDSession*, bool, bool, std::map<unsigned long, Objecter::Op*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::Op*> > >&, std::list<Objecter::LingerOp*, std::allocator<Objecter::LingerOp*> >&, std::map<unsigned long, Objecter::CommandOp*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::CommandOp*> > >&)+0x99) [0x7db499] 2: (Objecter::handle_osd_map(MOSDMap*)+0x1ecb) [0x7e35ab] 3: (Objecter::ms_dispatch(Message*)+0x257) [0x7e8487] 4: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x9ed827] 5: (DispatchQueue::entry()+0x44a) [0x9ea9ca] 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x8fafad] 7: (()+0x7e9a) [0x7f97577ade9a] 8: (clone()+0x6d) [0x7f97563623fd] -5> 2014-09-22 11:08:19.515001 7f974c8b4700 0 existing intermediate dependency Filer::PurgeRange (43) -> Objecter::rwlock (23) at: ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 1: (Filer::purge_range(inodeno_t, ceph_file_layout*, SnapContext const&, unsigned long, unsigned long, utime_t, int, Context*)+0x1d3) [0x7b3ab3] 2: (MDCache::purge_stray(CDentry*)+0x1606) [0x659696] 3: (MDCache::eval_stray(CDentry*, bool)+0xa20) [0x65a950] 4: (MDCache::trim(int, int)+0xef) [0x65bfbf] 5: (MDS::tick()+0x3b3) [0x58a2e3] 6: (MDSInternalContextBase::complete(int)+0x1db) [0x79ff7b] 7: (SafeTimer::timer_thread()+0x3cd) [0x904b5d] 8: (SafeTimerThread::entry()+0xd) [0x90775d] 9: (()+0x7e9a) [0x7f97577ade9a] 10: (clone()+0x6d) [0x7f97563623fd] -4> 2014-09-22 11:08:19.515018 7f974c8b4700 0 new dependency OSDSession::completion_lock (26) -> Filer::PurgeRange (43) creates a cycle at ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 1: (Filer::_do_purge_range(PurgeRange*, int)+0x38) [0x7b2f38] 2: (Context::complete(int)+0x9) [0x5a0ce9] 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1211) [0x7de021] 4: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b] 5: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219] 6: (Pipe::reader()+0x268c) [0xa1f0bc] 7: (Pipe::Reader::entry()+0xd) [0xa20d7d] 8: (()+0x7e9a) [0x7f97577ade9a] 9: (clone()+0x6d) [0x7f97563623fd]
existing dependency OSDSession (25) -> OSDSession::completion_lock (26) at: ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 1: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11dd) [0x7ddfed] 2: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b] 3: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219] 4: (Pipe::reader()+0x268c) [0xa1f0bc] 5: (Pipe::Reader::entry()+0xd) [0xa20d7d] 6: (()+0x7e9a) [0x7f2d02ab1e9a] 7: (clone()+0x6d) [0x7f2d016663fd] -6> 2014-09-22 11:08:38.060127 7f2cf74b8700 0 existing intermediate dependency Objecter::rwlock (23) -> OSDSession (25) at: ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 1: (Objecter::_scan_requests(Objecter::OSDSession*, bool, bool, std::map<unsigned long, Objecter::Op*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::Op*> > >&, std::list<Objecter::LingerOp*, std::allocator<Objecter::LingerOp*> >&, std::map<unsigned long, Objecter::CommandOp*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Objecter::CommandOp*> > >&)+0x99) [0x7db499] 2: (Objecter::handle_osd_map(MOSDMap*)+0x1ecb) [0x7e35ab] 3: (Objecter::ms_dispatch(Message*)+0x257) [0x7e8487] 4: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x9ed827] 5: (DispatchQueue::entry()+0x44a) [0x9ea9ca] 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x8fafad] 7: (()+0x7e9a) [0x7f2d02ab1e9a] 8: (clone()+0x6d) [0x7f2d016663fd] -5> 2014-09-22 11:08:38.060158 7f2cf74b8700 0 existing intermediate dependency Filer::Probe (43) -> Objecter::rwlock (23) at: ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 1: (Filer::_probe(Filer::Probe*)+0x560) [0x7b4330] 2: (Filer::probe(inodeno_t, ceph_file_layout*, snapid_t, unsigned long, unsigned long*, utime_t*, bool, int, Context*)+0x22c) [0x7b4bcc] 3: (Journaler::_probe(Context*, unsigned long*)+0xf8) [0x7b90a8] 4: (Journaler::_finish_read_head(int, ceph::buffer::list&)+0x443) [0x7bac53] 5: (Context::complete(int)+0x9) [0x5a0ce9] 6: (Finisher::finisher_thread_entry()+0x160) [0x852810] 7: (()+0x7e9a) [0x7f2d02ab1e9a] 8: (clone()+0x6d) [0x7f2d016663fd] -4> 2014-09-22 11:08:38.060171 7f2cf74b8700 0 new dependency OSDSession::completion_lock (26) -> Filer::Probe (43) creates a cycle at ceph version 0.85-882-g2a2711d (2a2711daf86534ece11cad4527d69d43ec91d661) 1: (Filer::C_Probe::finish(int)+0x99) [0x7b76c9] 2: (Context::complete(int)+0x9) [0x5a0ce9] 3: (Objecter::C_Stat::finish(int)+0xc0) [0x7b6370] 4: (Context::complete(int)+0x9) [0x5a0ce9] 5: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11ff) [0x7de00f] 6: (Objecter::ms_dispatch(Message*)+0x20b) [0x7e843b] 7: (DispatchQueue::fast_dispatch(Message*)+0x59) [0x9ea219] 8: (Pipe::reader()+0x268c) [0xa1f0bc] 9: (Pipe::Reader::entry()+0xd) [0xa20d7d] 10: (()+0x7e9a) [0x7f2d02ab1e9a] 11: (clone()+0x6d) [0x7f2d016663fd] -3> 2014-09-22 11:08:38.060191 7f2cf74b8700 0 btw, i am holding these locks: -2> 2014-09-22 11:08:38.060192 7f2cf74b8700 0 OSDSession::completion_lock (26) -1> 2014-09-22 11:08:38.060193 7f2cf74b8700 0
We probably need a finisher here, but it's also worth checking that the use of rwlock here is really unsafe: if both are just holding the read lock then it might be a false alarm.
#4 Updated by John Spray almost 9 years ago
- Status changed from New to In Progress
#5 Updated by John Spray almost 9 years ago
- Project changed from Ceph to CephFS
#6 Updated by Zheng Yan almost 9 years ago
can we just unlock the PurgeRange/Probe locks before using the objecter?
#7 Updated by John Spray almost 9 years ago
- Status changed from In Progress to Fix Under Review
#8 Updated by Zheng Yan almost 9 years ago
- Status changed from Fix Under Review to Resolved
#9 Updated by Greg Farnum almost 9 years ago
- Status changed from Resolved to Pending Backport
- Priority changed from Normal to Urgent
This is popping up in Giant as well, which I believe has the new code that was the proximate cause. :)
#10 Updated by John Spray almost 9 years ago
- Status changed from Pending Backport to Resolved
Backported to giant:
commit a8ac4b62a5e878c5658fda730a2f429297ba7c05 Author: John Spray <john.spray@redhat.com> Date: Wed Sep 24 14:19:32 2014 +0100 osdc/Filer: drop probe/purge locks before calling objecter Fixes: #9562 Signed-off-by: John Spray <john.spray@redhat.com> (cherry picked from commit 8dc94a2d8ce3364c0d8d52f634e5fc967e43d819)