Project

General

Profile

Actions

Bug #48148

open

mds: Server.cc:6764 FAILED assert(in->filelock.can_read(mdr->get_client()))

Added by wei qiaomiao over 3 years ago. Updated almost 2 years ago.

Status:
Triaged
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In my cluster with a single MDS, ceph version is 12.2.13, Assert will be encountered when a large number of deletion operations are performed. Now I can't reproduce it, so I didn't catch more logs.
backtrace:

 0> 2020-11-03 15:32:35.316352 7f47dd5aa700 -1 /share/ceph/rpmbuild/BUILD/ceph-12.2.13/src/mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode*)' thread 7f47dd5aa700 time 2020-11-03 15:32:35.311722
/share/ceph/rpmbuild/BUILD/ceph-12.2.13/src/mds/Server.cc: 6783: FAILED assert(in->filelock.can_read(mdr->get_client()))

 ceph version 12.2.13-1-560-g87ea0b6 (87ea0b6e94eaa3544572dd676db0e8932f56d7a8) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55c2f7b60640]
 2: (Server::_dir_is_nonempty(boost::intrusive_ptr<MDRequestImpl>&, CInode*)+0x1a8) [0x55c2f7813ab8]
 3: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x13df) [0x55c2f7843bef]
 4: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xdb9) [0x55c2f7869479]
 5: (MDSInternalContextBase::complete(int)+0x1fb) [0x55c2f7a9c34b]
 6: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0x16c) [0x55c2f77c10bc]
 7: (MDSCacheObject::finish_waiting(unsigned long, int)+0x46) [0x55c2f7ab66e6]
 8: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >*)+0x124f) [0x55c2f798958f]
 9: (Locker::wrlock_finish(SimpleLock*, MutationImpl*, bool*)+0x341) [0x55c2f798b261]
 10: (Locker::_drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x22c) [0x55c2f798f14c]
 11: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x76) [0x55c2f798f586]
 12: (Locker::scatter_writebehind_finish(ScatterLock*, boost::intrusive_ptr<MutationImpl>&)+0xd0) [0x55c2f798f6d0]
 13: (MDSIOContextBase::complete(int)+0xa5) [0x55c2f7a9c4e5]
 14: (MDSLogContextBase::complete(int)+0x3c) [0x55c2f7a9caec]
 15: (Finisher::finisher_thread_entry()+0x198) [0x55c2f7b5f2d8]
 16: (()+0x7e65) [0x7f47e9f64e65]
 17: (clone()+0x6d) [0x7f47e92588ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I have a suspicion. I'm not sure whether this problem is related to it.
In Server::handle_client_unlinkļ¼Œfirst, rdlock the filelock of the inode to be deleted, and then in Server::_dir_is_nonempty, confirm that filelock can read through "assert(in->filelock.can_read(mdr->get_client()))".But filelock allows can_rdlock but not allow can_read in two states, like the following two states:

  [LOCK_EXCL]      = { 0,         true,  LOCK_LOCK, 0,    0,   XCL, XCL, 0,   0,   0,  
  [LOCK_EXCL_XSYN] = { LOCK_XSYN, false, LOCK_LOCK, 0,    0,   XCL, 0,   0,   0,   0,

If the filelock is in these two states, will the above assert appears?

Actions #1

Updated by Patrick Donnelly over 3 years ago

  • Status changed from New to Triaged
  • Assignee set to Sidharth Anupkrishnan
  • Target version set to v16.0.0
  • Component(FS) MDS added
Actions #2

Updated by Patrick Donnelly over 3 years ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport set to pacific,octopus,nautilus
Actions #3

Updated by Sidharth Anupkrishnan about 3 years ago

Thanks for bringing this up!

Was it a delete only workload? Or was the directory in question subjected to any other operation before an rmdir was issued by the client?

Actions #4

Updated by wei qiaomiao about 3 years ago

The client randomly reads, writes, setsattr, and rmdir to all the directories, but it is not sure what operations have been done before rmdir to the directory in question.

Actions #5

Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v17.0.0)
Actions

Also available in: Atom PDF