Bug #48148
openmds: Server.cc:6764 FAILED assert(in->filelock.can_read(mdr->get_client()))
0%
Description
In my cluster with a single MDS, ceph version is 12.2.13, Assert will be encountered when a large number of deletion operations are performed. Now I can't reproduce it, so I didn't catch more logs.
backtrace:
0> 2020-11-03 15:32:35.316352 7f47dd5aa700 -1 /share/ceph/rpmbuild/BUILD/ceph-12.2.13/src/mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode*)' thread 7f47dd5aa700 time 2020-11-03 15:32:35.311722 /share/ceph/rpmbuild/BUILD/ceph-12.2.13/src/mds/Server.cc: 6783: FAILED assert(in->filelock.can_read(mdr->get_client())) ceph version 12.2.13-1-560-g87ea0b6 (87ea0b6e94eaa3544572dd676db0e8932f56d7a8) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55c2f7b60640] 2: (Server::_dir_is_nonempty(boost::intrusive_ptr<MDRequestImpl>&, CInode*)+0x1a8) [0x55c2f7813ab8] 3: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x13df) [0x55c2f7843bef] 4: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xdb9) [0x55c2f7869479] 5: (MDSInternalContextBase::complete(int)+0x1fb) [0x55c2f7a9c34b] 6: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0x16c) [0x55c2f77c10bc] 7: (MDSCacheObject::finish_waiting(unsigned long, int)+0x46) [0x55c2f7ab66e6] 8: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >*)+0x124f) [0x55c2f798958f] 9: (Locker::wrlock_finish(SimpleLock*, MutationImpl*, bool*)+0x341) [0x55c2f798b261] 10: (Locker::_drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x22c) [0x55c2f798f14c] 11: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x76) [0x55c2f798f586] 12: (Locker::scatter_writebehind_finish(ScatterLock*, boost::intrusive_ptr<MutationImpl>&)+0xd0) [0x55c2f798f6d0] 13: (MDSIOContextBase::complete(int)+0xa5) [0x55c2f7a9c4e5] 14: (MDSLogContextBase::complete(int)+0x3c) [0x55c2f7a9caec] 15: (Finisher::finisher_thread_entry()+0x198) [0x55c2f7b5f2d8] 16: (()+0x7e65) [0x7f47e9f64e65] 17: (clone()+0x6d) [0x7f47e92588ad] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I have a suspicion. I'm not sure whether this problem is related to it.
In Server::handle_client_unlinkļ¼first, rdlock the filelock of the inode to be deleted, and then in Server::_dir_is_nonempty, confirm that filelock can read through "assert(in->filelock.can_read(mdr->get_client()))".But filelock allows can_rdlock but not allow can_read in two states, like the following two states:
[LOCK_EXCL] = { 0, true, LOCK_LOCK, 0, 0, XCL, XCL, 0, 0, 0, [LOCK_EXCL_XSYN] = { LOCK_XSYN, false, LOCK_LOCK, 0, 0, XCL, 0, 0, 0, 0,
If the filelock is in these two states, will the above assert appears?
Updated by Patrick Donnelly over 3 years ago
- Status changed from New to Triaged
- Assignee set to Sidharth Anupkrishnan
- Target version set to v16.0.0
- Component(FS) MDS added
Updated by Patrick Donnelly over 3 years ago
- Target version changed from v16.0.0 to v17.0.0
- Backport set to pacific,octopus,nautilus
Updated by Sidharth Anupkrishnan about 3 years ago
Thanks for bringing this up!
Was it a delete only workload? Or was the directory in question subjected to any other operation before an rmdir was issued by the client?
Updated by wei qiaomiao about 3 years ago
The client randomly reads, writes, setsattr, and rmdir to all the directories, but it is not sure what operations have been done before rmdir to the directory in question.