Project

General

Profile

Actions

Bug #54421

open

mds: assert fail in Server::_dir_is_nonempty() because xlocker of filelock is -1

Added by Ivan Guan about 2 years ago. Updated about 2 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
ceph-10.2.2
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ENV: Jewel ceph-10.2.2

Description:
Server::_dir_is_nonempty() always expects inode has the xlocker, but sometimes, this assumption is not always true
even though the in::filelock::sate is LOCK_XLOCK_DONE.

2022-02-21 15:11:40.639247 7f1d3ef95700 1 mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode*)' thread 7f1d3ef95700 time 2022-02-21 15:11:40.043876
mds/Server.cc: 6245: FAILED assert(in
>filelock.can_read(mdr->get_client()))

ceph version 10.2.2-Summit3.0-beta2-127-g051c2f2 (051c2f2bd027a4c29bdf3f21116dbfb1c718b3db)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f1d453e9215]
2: (Server::_dir_is_nonempty(std::shared_ptr<MDRequestImpl>&, CInode*)+0xcb) [0x7f1d4501906b]
3: (Server::handle_client_unlink(std::shared_ptr<MDRequestImpl>&)+0xe84) [0x7f1d45044b34]
4: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xe9b) [0x7f1d45064d2b]
5: (MDCache::dispatch_request(std::shared_ptr<MDRequestImpl>&)+0x4c) [0x7f1d450e81dc]
6: (MDSInternalContextBase::complete(int)+0x1eb) [0x7f1d4523947b]
7: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0xac) [0x7f1d44ffe4cc]
8: (Locker::eval(CInode*, int, bool)+0x128) [0x7f1d45158f38]
9: (Locker::handle_client_caps(MClientCaps*)+0xd95) [0x7f1d4516e6e5]
10: (MDSRank::handle_deferrable_message(Message*)+0xc34) [0x7f1d44fe0624]
11: (MDSRank::_dispatch(Message*, bool)+0x205) [0x7f1d44fea9f5]
12: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f1d44feb8d5]
13: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f1d44fd1313]
14: (DispatchQueue::entry()+0x7ba) [0x7f1d454eee1a]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f1d453ce02d]
16: (()+0x7dc5) [0x7f1d441abdc5]
17: (clone()+0x6d) [0x7f1d42c7828d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

e.g. the following case:

t1: client send setattr op for dir1 and filelock is LOCK_EXCL
t2: xlock_start and hold the xlock
t3: mds transfer filelock state to LOCK_XLOCK_DONE from LOCK_XLOCK and do early_reply
t4: client rmdir dir1
t5: mds rdlock_start filelock and got the rdlock but can’t acquire the xlock of linklock, so wait here
t6. write op of setattr journal come back and triggered safe_reply of setattr. Notice: the filelock is XLOCK_DONE still but it’s xlocker is -1.
t7: mds retry rmdir dir1, the core of mds happened because filelock didn’t have xlocker.


Files

rmdir.png (87.8 KB) rmdir.png Ivan Guan, 02/28/2022 09:28 AM
Actions #2

Updated by Xiubo Li about 2 years ago

  • Project changed from Ceph to CephFS
  • Status changed from New to Fix Under Review
  • Assignee set to Ivan Guan
  • Pull request ID set to 45195
  • Component(FS) MDS added
  • Labels (FS) crash added
Actions #3

Updated by Ivan Guan about 2 years ago

the logs after t6 moment:
h3. send safe reply after writing journal to disk

2022-02-21 15:11:40.015369 7f1d39588700 10 mds.0.server reply_client_request 0 ((0) Success) client_request(client.120800413:79490 setattr mtime=2022-02-21 15:11:41.000000 atime=2022-02-21 15:11:44.209539 #1000243120c 2022-02-21 15:11:44.211394) v3
start drop xlock
2022-02-21 15:11:40.015403 7f1d39588700 10 mds.0.locker xlock_finish on (ifile xlockdone r=1 x=1) [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=4+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1 x=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsXsFs/pAsLsXsFs/Fx@17},l=120800413 | request=1 lock=4 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]

the filelock can’t be transferred to LOCK_LOCK because rdlock has been held already.
2022-02-21 15:11:40.015425 7f1d39588700 10 mds.0.locker eval_gather (ifile xlockdone r=1) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=4+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsXsFs/pAsLsXsFs/Fx@17},l=120800413 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.015441 7f1d39588700 10 mds.0.locker next state is lock issued/allows loner s/cb xlocker /cb other /cb

2022-02-21 15:11:40.040410 7f1d3ef95700 1 -- 10.231.3.1:6867/32689 <== client.120800413 10.231.2.1:0/3354810552 105854 ==== client_caps(update ino 1000243120c 5892302 seq 17 caps=pAsXsFs dirty=- wanted=Fx follows 0 size 0/0 ts 1/18446744073709551615 mtime 2022-02-21 15:11:41.000000 tws 2) v8 ==== 216+0+0 (3293606142 0 0) 0x7f1e15f58000 con 0x7f1d88f08e20

2022-02-21 15:11:40.042596 7f1d3ef95700 7 mds.0.locker handle_client_caps on 1000243120c tid 0 follows 0 op update
2022-02-21 15:11:40.042640 7f1d3ef95700 10 mds.0.locker cap inode [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsLsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042666 7f1d3ef95700 10 mds.0.locker follows 0 retains pAsXsFs dirty - on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042682 7f1d3ef95700 10 mds.0.locker _do_cap_update dirty - issued pAsXsFs wanted Fx on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042696 7f1d3ef95700 10 mds.0.locker eval 2496 [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042710 7f1d3ef95700 10 mds.0.locker eval set loner to client.120800413
2022-02-21 15:11:40.042712 7f1d3ef95700 10 mds.0.locker eval_gather (ifile xlockdone r=1) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042725 7f1d3ef95700 10 mds.0.locker next state is lock issued/allows loner s/cb xlocker /cb other /cb

retry rmdir
2022-02-21 15:11:40.042904 7f1d3ef95700 10 MDSInternalContextBase::complete: 18C_MDS_RetryRequest
2022-02-21 15:11:40.042907 7f1d3ef95700 7 mds.0.server dispatch_client_request client_request(client.120800413:79624 rmdir #1000242c424/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0 2022-02-21 15:11:45.089274) v3
2022-02-21 15:11:40.042912 7f1d3ef95700 7 mds.0.cache traverse: opening base ino 1000242c424 snap head
2022-02-21 15:11:40.042914 7f1d3ef95700 12 mds.0.cache traverse: path seg depth 0 'pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0' snapid head
2022-02-21 15:11:40.042916 7f1d3ef95700 20 mds.0.cache.dir(1000242c424) lookup (head, 'pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0')
2022-02-21 15:11:40.042921 7f1d3ef95700 20 mds.0.cache.dir(1000242c424) hit -> (pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0,head)
2022-02-21 15:11:40.042925 7f1d3ef95700 10 mds.0.cache path_traverse finish on snapid head
2022-02-21 15:11:40.042929 7f1d3ef95700 7 mds.0.server handle_client_rmdir on [dentry #1/fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0 [2,head] auth (dn xlock x=1 by 0x7f1d56498f00) (dversion lock w=1 last_client=120800413) v=33670 ap=2+2 inode=0x7f1e14504300 | request=1 lock=2 inodepin=1 dirty=1 authpin=1 0x7f1e1519f040]
2022-02-21 15:11:40.042943 7f1d3ef95700 7 mds.0.server dn links to [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042964 7f1d3ef95700 10 mds.0.server dir_is_nonempty_unlocked [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042988 7f1d3ef95700 10 mds.0.server straydn is [dentry #100/stray4/1000243120c [2,head] auth NULL (dn xlock x=1 by 0x7f1d56498f00) (dversion lock w=1 last_client=120800413) pv=0 v=2812421797 ap=2+0 inode=0 state=new | request=1 lock=2 authpin=1 0x7f1de6e45d40]
2022-02-21 15:11:40.043004 7f1d3ef95700 10 mds.0.locker acquire_locks request(client.120800413:79624 cr=0x7f1d553ba3c0)

2022-02-21 15:11:40.043021 7f1d3ef95700 20 mds.0.locker must xlock (ilink sync) [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]

2022-02-21 15:11:40.043268 7f1d3ef95700 20 mds.0.locker * must rdlock (ifile xlockdone r=1)* [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]

2022-02-21 15:11:40.043726 7f1d3ef95700 10 mds.0.locker already rdlocked (ifile xlockdone r=1) [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]

2022-02-21 15:11:40.043739 7f1d3ef95700 7 mds.0.locker xlock_start on (ilink sync) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.043752 7f1d3ef95700 7 mds.0.locker simple_lock on (ilink sync) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.043766 7f1d3ef95700 7 mds.0.locker simple_xlock on (ilink lock) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.043779 7f1d3ef95700 10 mds.0.cache.ino(1000243120c) auth_pin by 0x7f1e14504778 on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893] now 3+0

2022-02-21 15:11:40.043832 7f1d3ef95700 20 Session check_access path /fengtaosmu/Software/anaconda3/pkgs
2022-02-21 15:11:40.043836 7f1d3ef95700 10 MDSAuthCap is_capable inode(path /fengtaosmu/Software/anaconda3/pkgs owner 1033:1036 mode 040775) by caller 1033:1036 mask 2 new 0:0 cap: MDSAuthCaps[allow ]
2022-02-21 15:11:40.043853 7f1d3ef95700 10 mds.0.server *dir_is_nonempty [*inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink xlock x=1 by 0x7f1d56498f00) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=1 lock=4 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.639247 7f1d3ef95700 1 mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode
)' thread 7f1d3ef95700 time 2022-02-21 15:11:40.043876
mds/Server.cc: 6245: FAILED assert(in>filelock.can_read(mdr->get_client()))
ceph version 10.2.2-Summit3.0-beta2-127-g051c2f2 (051c2f2bd027a4c29bdf3f21116dbfb1c718b3db)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f1d453e9215]
2: (Server::_dir_is_nonempty(std::shared_ptr<MDRequestImpl>&, CInode*)+0xcb) [0x7f1d4501906b]
3: (Server::handle_client_unlink(std::shared_ptr<MDRequestImpl>&)+0xe84) [0x7f1d45044b34]
4: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xe9b) [0x7f1d45064d2b]
5: (MDCache::dispatch_request(std::shared_ptr<MDRequestImpl>&)+0x4c) [0x7f1d450e81dc]
6: (MDSInternalContextBase::complete(int)+0x1eb) [0x7f1d4523947b]
7: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0xac) [0x7f1d44ffe4cc]
8: (Locker::eval(CInode*, int, bool)+0x128) [0x7f1d45158f38]
9: (Locker::handle_client_caps(MClientCaps*)+0xd95) [0x7f1d4516e6e5]
10: (MDSRank::handle_deferrable_message(Message*)+0xc34) [0x7f1d44fe0624]
11: (MDSRank::_dispatch(Message*, bool)+0x205) [0x7f1d44fea9f5]
12: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f1d44feb8d5]
13: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f1d44fd1313]
14: (DispatchQueue::entry()+0x7ba) [0x7f1d454eee1a]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f1d453ce02d]
16: (()+0x7dc5) [0x7f1d441abdc5]
17: (clone()+0x6d) [0x7f1d42c7828d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #4

Updated by Ivan Guan about 2 years ago

Ivan Guan wrote:

the logs after t6 moment:
send safe reply after writing journal to disk

2022-02-21 15:11:40.015369 7f1d39588700 10 mds.0.server reply_client_request 0 ((0) Success) client_request(client.120800413:79490 setattr mtime=2022-02-21 15:11:41.000000 atime=2022-02-21 15:11:44.209539 #1000243120c 2022-02-21 15:11:44.211394) v3
start drop xlock
2022-02-21 15:11:40.015403 7f1d39588700 10 mds.0.locker xlock_finish on (ifile xlockdone r=1 x=1) [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=4+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1 x=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsXsFs/pAsLsXsFs/Fx@17},l=120800413 | request=1 lock=4 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]

the filelock can’t be transferred to LOCK_LOCK because rdlock has been held already.
2022-02-21 15:11:40.015425 7f1d39588700 10 mds.0.locker eval_gather (ifile xlockdone r=1) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=4+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsXsFs/pAsLsXsFs/Fx@17},l=120800413 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.015441 7f1d39588700 10 mds.0.locker next state is lock issued/allows loner s/cb xlocker /cb other /cb

2022-02-21 15:11:40.040410 7f1d3ef95700 1 -- 10.231.3.1:6867/32689 <== client.120800413 10.231.2.1:0/3354810552 105854 ==== client_caps(update ino 1000243120c 5892302 seq 17 caps=pAsXsFs dirty=- wanted=Fx follows 0 size 0/0 ts 1/18446744073709551615 mtime 2022-02-21 15:11:41.000000 tws 2) v8 ==== 216+0+0 (3293606142 0 0) 0x7f1e15f58000 con 0x7f1d88f08e20

2022-02-21 15:11:40.042596 7f1d3ef95700 7 mds.0.locker handle_client_caps on 1000243120c tid 0 follows 0 op update
2022-02-21 15:11:40.042640 7f1d3ef95700 10 mds.0.locker cap inode [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsLsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042666 7f1d3ef95700 10 mds.0.locker follows 0 retains pAsXsFs dirty - on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042682 7f1d3ef95700 10 mds.0.locker _do_cap_update dirty - issued pAsXsFs wanted Fx on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042696 7f1d3ef95700 10 mds.0.locker eval 2496 [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042710 7f1d3ef95700 10 mds.0.locker eval set loner to client.120800413
2022-02-21 15:11:40.042712 7f1d3ef95700 10 mds.0.locker eval_gather (ifile xlockdone r=1) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042725 7f1d3ef95700 10 mds.0.locker next state is lock issued/allows loner s/cb xlocker /cb other /cb

retry rmdir
2022-02-21 15:11:40.042904 7f1d3ef95700 10 MDSInternalContextBase::complete: 18C_MDS_RetryRequest
2022-02-21 15:11:40.042907 7f1d3ef95700 7 mds.0.server dispatch_client_request client_request(client.120800413:79624 rmdir #1000242c424/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0 2022-02-21 15:11:45.089274) v3
2022-02-21 15:11:40.042912 7f1d3ef95700 7 mds.0.cache traverse: opening base ino 1000242c424 snap head
2022-02-21 15:11:40.042914 7f1d3ef95700 12 mds.0.cache traverse: path seg depth 0 'pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0' snapid head
2022-02-21 15:11:40.042916 7f1d3ef95700 20 mds.0.cache.dir(1000242c424) lookup (head, 'pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0')
2022-02-21 15:11:40.042921 7f1d3ef95700 20 mds.0.cache.dir(1000242c424) hit -> (pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0,head)
2022-02-21 15:11:40.042925 7f1d3ef95700 10 mds.0.cache path_traverse finish on snapid head
2022-02-21 15:11:40.042929 7f1d3ef95700 7 mds.0.server handle_client_rmdir on [dentry #1/fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0 [2,head] auth (dn xlock x=1 by 0x7f1d56498f00) (dversion lock w=1 last_client=120800413) v=33670 ap=2+2 inode=0x7f1e14504300 | request=1 lock=2 inodepin=1 dirty=1 authpin=1 0x7f1e1519f040]
2022-02-21 15:11:40.042943 7f1d3ef95700 7 mds.0.server dn links to [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042964 7f1d3ef95700 10 mds.0.server dir_is_nonempty_unlocked [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.042988 7f1d3ef95700 10 mds.0.server straydn is [dentry #100/stray4/1000243120c [2,head] auth NULL (dn xlock x=1 by 0x7f1d56498f00) (dversion lock w=1 last_client=120800413) pv=0 v=2812421797 ap=2+0 inode=0 state=new | request=1 lock=2 authpin=1 0x7f1de6e45d40]
2022-02-21 15:11:40.043004 7f1d3ef95700 10 mds.0.locker acquire_locks request(client.120800413:79624 cr=0x7f1d553ba3c0)

2022-02-21 15:11:40.043021 7f1d3ef95700 20 mds.0.locker must xlock (ilink sync) [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]

2022-02-21 15:11:40.043268 7f1d3ef95700 20 mds.0.locker * must rdlock (ifile xlockdone r=1)* [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]

2022-02-21 15:11:40.043726 7f1d3ef95700 10 mds.0.locker already rdlocked (ifile xlockdone r=1) [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]

2022-02-21 15:11:40.043739 7f1d3ef95700 7 mds.0.locker xlock_start on (ilink sync) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.043752 7f1d3ef95700 7 mds.0.locker simple_lock on (ilink sync) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=0 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.043766 7f1d3ef95700 7 mds.0.locker simple_xlock on (ilink lock) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.043779 7f1d3ef95700 10 mds.0.cache.ino(1000243120c) auth_pin by 0x7f1e14504778 on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893] now 3+0

2022-02-21 15:11:40.043832 7f1d3ef95700 20 Session check_access path /fengtaosmu/Software/anaconda3/pkgs
2022-02-21 15:11:40.043836 7f1d3ef95700 10 MDSAuthCap is_capable inode(path /fengtaosmu/Software/anaconda3/pkgs owner 1033:1036 mode 040775) by caller 1033:1036 mask 2 new 0:0 cap: MDSAuthCaps[allow ]
2022-02-21 15:11:40.043853 7f1d3ef95700 10 mds.0.server *dir_is_nonempty [*inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink xlock x=1 by 0x7f1d56498f00) (isnap sync r=1) (inest lock) (ifile xlockdone r=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsXs/pAsXsFs/Fx@18},l=120800413 | request=1 lock=4 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:40.639247 7f1d3ef95700 1 mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode
)' thread 7f1d3ef95700 time 2022-02-21 15:11:40.043876
mds/Server.cc: 6245: FAILED assert(in>filelock.can_read(mdr->get_client()))
ceph version 10.2.2-Summit3.0-beta2-127-g051c2f2 (051c2f2bd027a4c29bdf3f21116dbfb1c718b3db)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f1d453e9215]
2: (Server::_dir_is_nonempty(std::shared_ptr<MDRequestImpl>&, CInode*)+0xcb) [0x7f1d4501906b]
3: (Server::handle_client_unlink(std::shared_ptr<MDRequestImpl>&)+0xe84) [0x7f1d45044b34]
4: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xe9b) [0x7f1d45064d2b]
5: (MDCache::dispatch_request(std::shared_ptr<MDRequestImpl>&)+0x4c) [0x7f1d450e81dc]
6: (MDSInternalContextBase::complete(int)+0x1eb) [0x7f1d4523947b]
7: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0xac) [0x7f1d44ffe4cc]
8: (Locker::eval(CInode*, int, bool)+0x128) [0x7f1d45158f38]
9: (Locker::handle_client_caps(MClientCaps*)+0xd95) [0x7f1d4516e6e5]
10: (MDSRank::handle_deferrable_message(Message*)+0xc34) [0x7f1d44fe0624]
11: (MDSRank::_dispatch(Message*, bool)+0x205) [0x7f1d44fea9f5]
12: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f1d44feb8d5]
13: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f1d44fd1313]
14: (DispatchQueue::entry()+0x7ba) [0x7f1d454eee1a]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f1d453ce02d]
16: (()+0x7dc5) [0x7f1d441abdc5]
17: (clone()+0x6d) [0x7f1d42c7828d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #5

Updated by Xiubo Li about 2 years ago

t5: mds rdlock_start filelock and got the rdlock but can’t acquire the xlock of linklock, so wait here

Here you mean the rmddir will wait because it failed to get xlock of linklock. But if so it shouldn't get rdlock of filelock because the linklock is before the filelock:

 7734   // lock
 7735   if (!(mdr->locking_state & MutationImpl::ALL_LOCKED)) {
 7736     MutationImpl::LockOpVec lov;
 7737  
 7738     lov.add_xlock(&in->linklock);
 7739     lov.add_xlock(&in->snaplock);                                                                                                                
 7740     if (in->is_dir())
 7741       lov.add_rdlock(&in->filelock);   // to verify it's empty
 7742  
 7743     if (straydn) {
 7744       lov.add_wrlock(&straydn->get_dir()->inode->filelock);
 7745       lov.add_wrlock(&straydn->get_dir()->inode->nestlock);
 7746       lov.add_xlock(&straydn->lock);      
 7747     }
 7748  
 7749     if (!mds->locker->acquire_locks(mdr, lov))
 7750       return;

Do you have the full logs ?

Actions #6

Updated by Ivan Guan about 2 years ago

Xiubo Li wrote:

t5: mds rdlock_start filelock and got the rdlock but can’t acquire the xlock of linklock, so wait here

Here you mean the rmddir will wait because it failed to get xlock of linklock. But if so it shouldn't get rdlock of filelock because the linklock is before the filelock:

[...]

Do you have the full logs ?

In 14.2.5 filelock is inserted to lov earlier than linklock.
MutationImpl::LockOpVec lov;

for (int i=0; i<(int)trace.size()-1; i++)
lov.add_rdlock(&trace[i]->lock);
lov.add_xlock(&dn->lock);
lov.add_wrlock(&diri->filelock);
lov.add_wrlock(&diri->nestlock);
lov.add_xlock(&in->linklock);
if (straydn) {
lov.add_wrlock(&straydn->get_dir()->inode->filelock);
lov.add_wrlock(&straydn->get_dir()->inode->nestlock);
lov.add_xlock(&straydn->lock);
}

As we can see, except linklock, there still has straydn’s lock need to acquire. If mds can’t acquire them smoothly, the xlock of filelock may have a chance to be dropped by other logic. If that’s the case, the same case lockstate is XLOCKDONE,xlock_by_client = -1,rdlock=1 will happen.
what do you think?

the important logs:
2022-02-21 15:11:39.961277 7f1d3ef95700 10 mds.0.locker acquire_locks request(client.120800413:79624 cr=0x7f1d553ba3c0)

2022-02-21 15:11:39.961692 7f1d3ef95700 20 mds.0.locker must rdlock (ifile xlockdone x=1) [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33610 pv33670 ap=2+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (inest lock) (ifile xlockdone x=1) (iversion lock w=1 last_client=120800413) caps={120800413=pAsLsXsFs/Fx@16},l=120800413 | request=1 lock=2 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]

2022-02-21 15:11:39.962664 7f1d3ef95700 7 mds.0.locker rdlock_start on (ifile xlockdone x=1) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33610 pv33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone x=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsLsXsFs/Fx@16},l=120800413 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:39.962674 7f1d3ef95700 10 mds.0.locker got rdlock on (ifile xlockdone r=1 x=1) [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33610 pv33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1 x=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsLsXsFs/Fx@16},l=120800413 | request=1 lock=4 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
//把锁给拿到了,但是拿ilink锁的时候拿不到了,进入了等待
2022-02-21 15:11:39.962684 7f1d3ef95700 7 mds.0.locker xlock_start on (ilink sync) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33610 pv33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1 x=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsLsXsFs/Fx@16},l=120800413 | request=1 lock=4 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:39.962698 7f1d3ef95700 7 mds.0.locker simple_lock on (ilink sync) on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33610 pv33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (isnap sync r=1) (inest lock) (ifile xlockdone r=1 x=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsLsXsFs/Fx@16},l=120800413 | request=1 lock=4 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:39.962715 7f1d3ef95700 7 mds.0.locker issue_caps loner client.120800413 allowed=pAsXsFcb, xlocker allowed=pAsXsFscb, others allowed=pAsXsFcb on [inode 1000243120c [...2,head] /fengtaosmu/Software/anaconda3/pkgs/pyflakes-2.3.1-pyhd3eb1b0_0o5q2n5a0/ auth v33610 pv33670 ap=3+0 dirtyparent f(v0 m2022-02-21 15:11:43.149382) n(v0 rc2022-02-21 15:11:43.149382 1=0+1) (ilink sync->lock) (isnap sync r=1) (inest lock) (ifile xlockdone r=1 x=1) (iversion lock w=2 last_client=120800413) caps={120800413=pAsLsXsFs/Fx@16},l=120800413 | request=1 lock=4 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x7f1e14504300 mode: 16893]
2022-02-21 15:11:39.962737 7f1d3ef95700 20 mds.0.locker client.120800413 pending pAsLsXsFs allowed pAsXsFscb wanted Fx
2022-02-21 15:11:39.962741 7f1d3ef95700 7 mds.0.locker sending MClientCaps to client.120800413 seq 17 new pending pAsXsFs was pAsLsXsFs
2022-02-21 15:11:39.962760 7f1d3ef95700 20 mds.0.cache.ino(1000243120c) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 ctime 2022-02-21 15:11:44.211394
2022-02-21 15:11:39.962768 7f1d3ef95700 10 mds.0.96548 send_message_client_counted client.120800413 seq 39942 client_caps(revoke ino 1000243120c 5892302 seq 17 caps=pAsXsFs dirty=- wanted=Fx follows 0 size 0/0 ts 1/18446744073709551615 mtime 2022-02-21 15:11:41.000000 tws 2) v8
2022-02-21 15:11:39.962779 7f1d3ef95700 1 -- 10.231.3.1:6867/32689 --> 10.231.2.1:0/3354810552 -- client_caps(revoke ino 1000243120c 5892302 seq 17 caps=pAsXsFs dirty=- wanted=Fx follows 0 size 0/0 ts 1/18446744073709551615 mtime 2022-02-21 15:11:41.000000 tws 2) v8 -- ?+0 0x7f1e15f58000 con 0x7f1d88f08e20

2022-02-21 15:11:39.962830 7f1d3ef95700 10 mds.0.cache.ino(1000243120c) add_waiter tag 600000 0x7f1de0d9c600 !ambig 1 !frozen 1 !freezing 1
2022-02-21 15:11:39.962833 7f1d3ef95700 15 mds.0.cache.ino(1000243120c) taking waiter here

Actions #7

Updated by Xiubo Li about 2 years ago

Ivan Guan wrote:

Xiubo Li wrote:

t5: mds rdlock_start filelock and got the rdlock but can’t acquire the xlock of linklock, so wait here

Here you mean the rmddir will wait because it failed to get xlock of linklock. But if so it shouldn't get rdlock of filelock because the linklock is before the filelock:

[...]

Do you have the full logs ?

In 14.2.5 filelock is inserted to lov earlier than linklock.
MutationImpl::LockOpVec lov;

for (int i=0; i<(int)trace.size()-1; i++)
lov.add_rdlock(&trace[i]->lock);
lov.add_xlock(&dn->lock);
lov.add_wrlock(&diri->filelock);
lov.add_wrlock(&diri->nestlock);
lov.add_xlock(&in->linklock);
if (straydn) {
lov.add_wrlock(&straydn->get_dir()->inode->filelock);
lov.add_wrlock(&straydn->get_dir()->inode->nestlock);
lov.add_xlock(&straydn->lock);
}

Okay.

As we can see, except linklock, there still has straydn’s lock need to acquire. If mds can’t acquire them smoothly, the xlock of filelock may have a chance to be dropped by other logic. If that’s the case, the same case lockstate is XLOCKDONE,xlock_by_client = -1,rdlock=1 will happen.
what do you think?

I have commented in the PR, I am afraid you are changing the `XCL` semantic. Another approach is that maybe you can just drop the acquired locks in some cases instead ?

And from my understanding the `XCL` here should also allow the AUTH to read.

Actions

Also available in: Atom PDF