Bug #58340
mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename))
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Q/A
Tags:
backport_processed
Backport:
reef,pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/vshankar-2022-12-21_14:01:01-fs-wip-vshankar-testing-20221215.112736-testing-default-smithi/7123518
The MDS is kind of stuck in some lock transition and not making progress servicing client requests.
2022-12-21T15:08:32.659+0000 1950e700 7 mds.1.server reply_client_request 0 ((0) Success) client_request(client.4536:15443 unlink #0x40000000206/c24 2022-12-21T15:08:32.520905+0000 caller_uid=1000, caller_gid=1265{6,36,1000,1265,}) v6 2022-12-21T15:08:32.659+0000 1950e700 10 mds.1.server apply_allocated_inos 0x0 / [] / 0x0 2022-12-21T15:08:32.659+0000 1950e700 7 mds.1.locker local_wrlock_finish on (iversion lock w=1 last_client=4536) on [inode 0x101 [...2,head] ~mds1/ auth v21 ap=1 snaprealm=0xf46a930 DIRTYPARENT f(v0 10=0+10) n(v5 rc2022-12-21T15:08:32.520905+0000 b4194304 14=3+11)/n(v0 rc2022-12-21T14:44:28.849089+0000 11=0+11) (inest lock w=1) (iversion lock w=1 last_client=4536) | dirtyscattered=0 lock=2 dirfrag=1 dirtyparent=1 dirty=1 authpin=1 0xf46a0e0] 2022-12-21T15:08:32.659+0000 1950e700 7 mds.1.locker wrlock_finish on (inest lock w=1) on [inode 0x101 [...2,head] ~mds1/ auth v21 ap=1 snaprealm=0xf46a930 DIRTYPARENT f(v0 10=0+10) n(v5 rc2022-12-21T15:08:32.520905+0000 b4194304 14=3+11)/n(v0 rc2022-12-21T14:44:28.849089+0000 11=0+11) (inest lock w=1) (iversion lock) | dirtyscattered=0 lock=1 dirfrag=1 dirtyparent=1 dirty=1 authpin=1 0xf46a0e0] 2022-12-21T15:08:32.659+0000 1950e700 10 mds.1.locker scatter_eval (inest lock) on [inode 0x101 [...2,head] ~mds1/ auth v21 ap=1 snaprealm=0xf46a930 DIRTYPARENT f(v0 10=0+10) n(v5 rc2022-12-21T15:08:32.520905+0000 b4194304 14=3+11)/n(v0 rc2022-12-21T14:44:28.849089+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 dirtyparent=1 dirty=1 authpin=1 0xf46a0e0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker local_wrlock_finish on (iversion lock w=1 last_client=4536) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock w=1) (iversion lock w=1 last_client=4536) | lock=3 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker wrlock_finish on (ifile lock w=1) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock w=1) (iversion lock) | lock=2 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker file_eval wanted= loner_wanted= other_wanted= filelock=(ifile lock) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock) (iversion lock) | lock=1 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker file_eval stable, bump to sync (ifile lock) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock) (iversion lock) | lock=1 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker simple_sync on (ifile lock) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock) (iversion lock) | lock=1 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.661+0000 1950e700 7 mds.1.locker wrlock_finish on (inest lock w=1) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (iversion lock) | lock=1 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.661+0000 1950e700 10 mds.1.locker scatter_eval (inest lock) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock) (iversion lock) | lock=0 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.661+0000 1950e700 10 mds.1.locker xlock_finish on (dn xlockdone l x=1) [dentry #0x1/client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/c24 [2,head] auth NULL (dn xlockdone l x=1) (dversion lock w=1 last_client=4536) v=350 ap=2 ino=(nil) state=1610612736 | request=1 lock=2 inodepin=0 replicated=0 dirty=1 authpin=1 clientlease=1 0x1acdffd0] 2022-12-21T15:08:32.661+0000 1950e700 10 mds.1.locker eval_gather (dn xlockdone l) on [dentry #0x1/client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/c24 [2,head] auth NULL (dn xlockdone l) (dversion lock w=1 last_client=4536) v=350 ap=2 ino=(nil) state=1610612736 | request=1 lock=1 inodepin=0 replicated=0 dirty=1 authpin=1 clientlease=1 0x1acdffd0] 2022-12-21T15:08:32.661+0000 1950e700 7 mds.1.locker eval_gather finished gather on (dn xlockdone l) on [dentry #0x1/client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/c24 [2,head] auth NULL (dn xlockdone l) (dversion lock w=1 last_client=4536) v=350 ap=2 ino=(nil) state=1610612736 | request=1 lock=1 inodepin=0 replicated=0 dirty=1 authpin=1 clientlease=1 0x1acdffd0] 2022-12-21T15:08:32.662+0000 1950e700 10 mds.1.cache.den(0x40000000206 c24) auth_unpin by 0x1ace00b0 on [dentry #0x1/client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/c24 [2,head] auth NULL (dn sync l) (dversion lock w=1 last_client=4536) v=350 ap=1 ino=(nil) state=1610612736 | request=1 lock=1 inodepin=0 replicated=0 dirty=1 authpin=1 clientlease=1 0x1acdffd0] now 1 2022-12-21T15:08:32.662+0000 1950e700 15 mds.1.cache.dir(0x40000000206) adjust_nested_auth_pins -1 on [dir 0x40000000206 /client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/ [2,head] auth{0=1,3=1} v=351 cv=0/0 dir_auth=1 ap=1+1 state=1610874881|complete f(v0 m2022-12-21T15:08:32.520905+0000 11=8+3)/f(v0 m2022-12-21T15:08:10.789215+0000 12=9+3) n(v12 rc2022-12-21T15:08:32.520905+0000 b31204703 35=26+9)/n(v12 rc2022-12-21T15:08:23.108040+0000 b30920189 35=27+8) hs=11+4,ss=0+0 dirty=15 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x1cd27040] by 0x1ace00b0 count now 1/1
Related issues
History
#2 Updated by Venky Shankar 9 months ago
- Status changed from New to Triaged
#4 Updated by Venky Shankar 6 months ago
- Related to Bug #59134: mds: deadlock during unlink with multimds (postgres) added
#5 Updated by Xiubo Li 5 months ago
2023-02-10T05:33:35.813+0000 7f3673425223 0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 723.3245213 secs 2023-02-10T05:33:35.813+0000 7f3673425223 0 log_channel(cluster) log [WRN] : slow request 723.32452 seconds old, received at 2023-04-19T18:24:02.644336+0000: client_request(mds.0:343992 rename #0x20000008f81/posd #0x629/3000034765e caller_uid=0, caller_gid=0{}) currently acquired locks 2023-02-10T05:33:35.813+0000 7f367345223 0 log_channel(cluster) log [WRN] : slow request 723.34452 seconds old, received at 2023-04-19T18:24:02.645006+0000: client_request(client.252712820:58191 unlink #0x20000008f81/posd 2023-04-19T18:24:02.644807+0000 caller_uid=0, caller_gid=0{}) currently requesting remote authpins
This is because when one file has hard links and when the first hard link was unlinked, it will remove the corresponding dentry and then try to move the corresponding CInode to stray dir. But since the nlink != 0 so it will try to reintergrated the stray dentry with another hardlink in the same MDS or remote MDS, which it will send a rename request to that MDS.
So after that if a client tries to unlink another hard link it will cause deadlock.
#9 Updated by Kotresh Hiremath Ravishankar 5 months ago
#12 Updated by Kotresh Hiremath Ravishankar 4 months ago
#13 Updated by Patrick Donnelly 4 months ago
- Status changed from Fix Under Review to Pending Backport
- Target version changed from v18.0.0 to v19.0.0
- Backport changed from pacific,quincy to reef,pacific,quincy
#14 Updated by Backport Bot 4 months ago
- Copied to Backport #61346: pacific: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
#15 Updated by Backport Bot 4 months ago
- Copied to Backport #61347: reef: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
#16 Updated by Backport Bot 4 months ago
- Copied to Backport #61348: quincy: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
#17 Updated by Backport Bot 4 months ago
- Tags set to backport_processed
#18 Updated by Kotresh Hiremath Ravishankar 4 months ago
#19 Updated by Kotresh Hiremath Ravishankar 4 months ago
#20 Updated by Patrick Donnelly about 1 month ago
- Related to Bug #62096: mds: infinite rename recursion on itself added
#21 Updated by Xiubo Li about 1 month ago
- Status changed from Pending Backport to Resolved
#22 Updated by Patrick Donnelly 18 days ago
- Related to Bug #61818: mds: deadlock between unlink and linkmerge added
#23 Updated by Patrick Donnelly 18 days ago
- Related to Bug #56695: [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish) added