Project

General

Profile

Actions

Bug #54421

open

mds: assert fail in Server::_dir_is_nonempty() because xlocker of filelock is -1

Added by Ivan Guan about 2 years ago. Updated about 2 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
ceph-10.2.2
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ENV: Jewel ceph-10.2.2

Description:
Server::_dir_is_nonempty() always expects inode has the xlocker, but sometimes, this assumption is not always true
even though the in::filelock::sate is LOCK_XLOCK_DONE.

2022-02-21 15:11:40.639247 7f1d3ef95700 1 mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode*)' thread 7f1d3ef95700 time 2022-02-21 15:11:40.043876
mds/Server.cc: 6245: FAILED assert(in
>filelock.can_read(mdr->get_client()))

ceph version 10.2.2-Summit3.0-beta2-127-g051c2f2 (051c2f2bd027a4c29bdf3f21116dbfb1c718b3db)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f1d453e9215]
2: (Server::_dir_is_nonempty(std::shared_ptr<MDRequestImpl>&, CInode*)+0xcb) [0x7f1d4501906b]
3: (Server::handle_client_unlink(std::shared_ptr<MDRequestImpl>&)+0xe84) [0x7f1d45044b34]
4: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xe9b) [0x7f1d45064d2b]
5: (MDCache::dispatch_request(std::shared_ptr<MDRequestImpl>&)+0x4c) [0x7f1d450e81dc]
6: (MDSInternalContextBase::complete(int)+0x1eb) [0x7f1d4523947b]
7: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0xac) [0x7f1d44ffe4cc]
8: (Locker::eval(CInode*, int, bool)+0x128) [0x7f1d45158f38]
9: (Locker::handle_client_caps(MClientCaps*)+0xd95) [0x7f1d4516e6e5]
10: (MDSRank::handle_deferrable_message(Message*)+0xc34) [0x7f1d44fe0624]
11: (MDSRank::_dispatch(Message*, bool)+0x205) [0x7f1d44fea9f5]
12: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f1d44feb8d5]
13: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f1d44fd1313]
14: (DispatchQueue::entry()+0x7ba) [0x7f1d454eee1a]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f1d453ce02d]
16: (()+0x7dc5) [0x7f1d441abdc5]
17: (clone()+0x6d) [0x7f1d42c7828d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

e.g. the following case:

t1: client send setattr op for dir1 and filelock is LOCK_EXCL
t2: xlock_start and hold the xlock
t3: mds transfer filelock state to LOCK_XLOCK_DONE from LOCK_XLOCK and do early_reply
t4: client rmdir dir1
t5: mds rdlock_start filelock and got the rdlock but can’t acquire the xlock of linklock, so wait here
t6. write op of setattr journal come back and triggered safe_reply of setattr. Notice: the filelock is XLOCK_DONE still but it’s xlocker is -1.
t7: mds retry rmdir dir1, the core of mds happened because filelock didn’t have xlocker.


Files

rmdir.png (87.8 KB) rmdir.png Ivan Guan, 02/28/2022 09:28 AM
Actions

Also available in: Atom PDF