Project

General

Profile

Actions

Bug #45261

closed

mds: FAILED assert(locking == lock) in MutationImpl::finish_locking

Added by Dan van der Ster about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
octopus,nautilus,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

We got two identical crashes a few minutes apart on two different active MDS's:

2020-04-24 12:57:38.253616 7fb5a2485700 -1 /builddir/build/BUILD/ceph-12.2.12/src/mds/Mutation.cc: In function 'void Mu
tationImpl::finish_locking(SimpleLock*)' thread 7fb5a2485700 time 2020-04-24 12:57:38.246037
/builddir/build/BUILD/ceph-12.2.12/src/mds/Mutation.cc: 67: FAILED assert(locking == lock)

 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55d0955c75f0]
 2: (()+0x38963f) [0x55d09533263f]
 3: (Locker::xlock_start(SimpleLock*, boost::intrusive_ptr<MDRequestImpl>&)+0x403) [0x55d09540e2b3]
 4: (Locker::acquire_locks(boost::intrusive_ptr<MDRequestImpl>&, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&, std::map<SimpleLock*, int, std::less<SimpleLock*>, std::allocator<std::pair<SimpleLock* const, int> > >*, CInode*, bool)+0x1faa) [0x55d09541ce7a]
 5: (Server::handle_client_setattr(boost::intrusive_ptr<MDRequestImpl>&)+0x23c) [0x55d0952d548c]
 6: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xceb) [0x55d09530d7db]
 7: (MDSInternalContextBase::complete(int)+0x1eb) [0x55d09550e5ab]
 8: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0xac) [0x55d09527824c]
 9: (Locker::eval(CInode*, int, bool)+0x127) [0x55d095415f37]
 10: (Locker::handle_client_caps(MClientCaps*)+0x144f) [0x55d09542c1ff]
 11: (Locker::dispatch(Message*)+0xa5) [0x55d09542db95]
 12: (MDSRank::handle_deferrable_message(Message*)+0xbb4) [0x55d09527e484]
 13: (MDSRank::_dispatch(Message*, bool)+0x1e3) [0x55d095295de3]
 14: (MDSRankDispatcher::ms_dispatch(Message*)+0xa8) [0x55d095296db8]
 15: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55d095274ef3]
 16: (DispatchQueue::entry()+0x792) [0x55d0958cba42]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x55d09564f3ed]
 18: (()+0x7e65) [0x7fb5a74d6e65]
 19: (clone()+0x6d) [0x7fb5a65b188d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The crashes were identical, but the it was from two different clients, on different inodes in different directories.
We have coredumps for further debugging.

Here we see locking is 0x0:

(gdb) up
#8  0x000055d09540e2b3 in Locker::xlock_start (this=this@entry=0x55d0a078a1b0, lock=0x55d282174590, mut=...)
    at /usr/src/debug/ceph-12.2.12/src/mds/Locker.cc:1661
1661        mut->finish_locking(lock);
(gdb) p lock
$2 = (SimpleLock *) 0x55d282174590

(gdb) p ((MutationImpl *)mut).locking
$10 = (SimpleLock *) 0x0
(gdb) 

Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #45685: octopus: mds: FAILED assert(locking == lock) in MutationImpl::finish_lockingResolvedNathan CutlerActions
Copied to CephFS - Backport #45686: nautilus: mds: FAILED assert(locking == lock) in MutationImpl::finish_lockingResolvedNathan CutlerActions
Copied to CephFS - Backport #45687: luminous: mds: FAILED assert(locking == lock) in MutationImpl::finish_lockingResolvedSidharth AnupkrishnanActions
Actions

Also available in: Atom PDF