Project

General

Profile

Bug #48148

mds: Server.cc:6764 FAILED assert(in->filelock.can_read(mdr->get_client()))

Added by wei qiaomiao 10 months ago. Updated 8 months ago.

Status:
Triaged
Priority:
Normal
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In my cluster with a single MDS, ceph version is 12.2.13, Assert will be encountered when a large number of deletion operations are performed. Now I can't reproduce it, so I didn't catch more logs.
backtrace:

 0> 2020-11-03 15:32:35.316352 7f47dd5aa700 -1 /share/ceph/rpmbuild/BUILD/ceph-12.2.13/src/mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode*)' thread 7f47dd5aa700 time 2020-11-03 15:32:35.311722
/share/ceph/rpmbuild/BUILD/ceph-12.2.13/src/mds/Server.cc: 6783: FAILED assert(in->filelock.can_read(mdr->get_client()))

 ceph version 12.2.13-1-560-g87ea0b6 (87ea0b6e94eaa3544572dd676db0e8932f56d7a8) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55c2f7b60640]
 2: (Server::_dir_is_nonempty(boost::intrusive_ptr<MDRequestImpl>&, CInode*)+0x1a8) [0x55c2f7813ab8]
 3: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x13df) [0x55c2f7843bef]
 4: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xdb9) [0x55c2f7869479]
 5: (MDSInternalContextBase::complete(int)+0x1fb) [0x55c2f7a9c34b]
 6: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0x16c) [0x55c2f77c10bc]
 7: (MDSCacheObject::finish_waiting(unsigned long, int)+0x46) [0x55c2f7ab66e6]
 8: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >*)+0x124f) [0x55c2f798958f]
 9: (Locker::wrlock_finish(SimpleLock*, MutationImpl*, bool*)+0x341) [0x55c2f798b261]
 10: (Locker::_drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x22c) [0x55c2f798f14c]
 11: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x76) [0x55c2f798f586]
 12: (Locker::scatter_writebehind_finish(ScatterLock*, boost::intrusive_ptr<MutationImpl>&)+0xd0) [0x55c2f798f6d0]
 13: (MDSIOContextBase::complete(int)+0xa5) [0x55c2f7a9c4e5]
 14: (MDSLogContextBase::complete(int)+0x3c) [0x55c2f7a9caec]
 15: (Finisher::finisher_thread_entry()+0x198) [0x55c2f7b5f2d8]
 16: (()+0x7e65) [0x7f47e9f64e65]
 17: (clone()+0x6d) [0x7f47e92588ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I have a suspicion. I'm not sure whether this problem is related to it.
In Server::handle_client_unlinkļ¼Œfirst, rdlock the filelock of the inode to be deleted, and then in Server::_dir_is_nonempty, confirm that filelock can read through "assert(in->filelock.can_read(mdr->get_client()))".But filelock allows can_rdlock but not allow can_read in two states, like the following two states:

  [LOCK_EXCL]      = { 0,         true,  LOCK_LOCK, 0,    0,   XCL, XCL, 0,   0,   0,  
  [LOCK_EXCL_XSYN] = { LOCK_XSYN, false, LOCK_LOCK, 0,    0,   XCL, 0,   0,   0,   0,

If the filelock is in these two states, will the above assert appears?

History

#1 Updated by Patrick Donnelly 10 months ago

  • Status changed from New to Triaged
  • Assignee set to Sidharth Anupkrishnan
  • Target version set to v16.0.0
  • Component(FS) MDS added

#2 Updated by Patrick Donnelly 8 months ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport set to pacific,octopus,nautilus

#3 Updated by Sidharth Anupkrishnan 8 months ago

Thanks for bringing this up!

Was it a delete only workload? Or was the directory in question subjected to any other operation before an rmdir was issued by the client?

#4 Updated by wei qiaomiao 8 months ago

The client randomly reads, writes, setsattr, and rmdir to all the directories, but it is not sure what operations have been done before rmdir to the directory in question.

Also available in: Atom PDF