Actions
Bug #58340
closedmds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename))
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Q/A
Tags:
backport_processed
Backport:
reef,pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/vshankar-2022-12-21_14:01:01-fs-wip-vshankar-testing-20221215.112736-testing-default-smithi/7123518
The MDS is kind of stuck in some lock transition and not making progress servicing client requests.
2022-12-21T15:08:32.659+0000 1950e700 7 mds.1.server reply_client_request 0 ((0) Success) client_request(client.4536:15443 unlink #0x40000000206/c24 2022-12-21T15:08:32.520905+0000 caller_uid=1000, caller_gid=1265{6,36,1000,1265,}) v6 2022-12-21T15:08:32.659+0000 1950e700 10 mds.1.server apply_allocated_inos 0x0 / [] / 0x0 2022-12-21T15:08:32.659+0000 1950e700 7 mds.1.locker local_wrlock_finish on (iversion lock w=1 last_client=4536) on [inode 0x101 [...2,head] ~mds1/ auth v21 ap=1 snaprealm=0xf46a930 DIRTYPARENT f(v0 10=0+10) n(v5 rc2022-12-21T15:08:32.520905+0000 b4194304 14=3+11)/n(v0 rc2022-12-21T14:44:28.849089+0000 11=0+11) (inest lock w=1) (iversion lock w=1 last_client=4536) | dirtyscattered=0 lock=2 dirfrag=1 dirtyparent=1 dirty=1 authpin=1 0xf46a0e0] 2022-12-21T15:08:32.659+0000 1950e700 7 mds.1.locker wrlock_finish on (inest lock w=1) on [inode 0x101 [...2,head] ~mds1/ auth v21 ap=1 snaprealm=0xf46a930 DIRTYPARENT f(v0 10=0+10) n(v5 rc2022-12-21T15:08:32.520905+0000 b4194304 14=3+11)/n(v0 rc2022-12-21T14:44:28.849089+0000 11=0+11) (inest lock w=1) (iversion lock) | dirtyscattered=0 lock=1 dirfrag=1 dirtyparent=1 dirty=1 authpin=1 0xf46a0e0] 2022-12-21T15:08:32.659+0000 1950e700 10 mds.1.locker scatter_eval (inest lock) on [inode 0x101 [...2,head] ~mds1/ auth v21 ap=1 snaprealm=0xf46a930 DIRTYPARENT f(v0 10=0+10) n(v5 rc2022-12-21T15:08:32.520905+0000 b4194304 14=3+11)/n(v0 rc2022-12-21T14:44:28.849089+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 dirtyparent=1 dirty=1 authpin=1 0xf46a0e0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker local_wrlock_finish on (iversion lock w=1 last_client=4536) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock w=1) (iversion lock w=1 last_client=4536) | lock=3 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker wrlock_finish on (ifile lock w=1) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock w=1) (iversion lock) | lock=2 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker file_eval wanted= loner_wanted= other_wanted= filelock=(ifile lock) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock) (iversion lock) | lock=1 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker file_eval stable, bump to sync (ifile lock) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock) (iversion lock) | lock=1 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.660+0000 1950e700 7 mds.1.locker simple_sync on (ifile lock) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (ifile lock) (iversion lock) | lock=1 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.661+0000 1950e700 7 mds.1.locker wrlock_finish on (inest lock w=1) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock w=1) (iversion lock) | lock=1 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.661+0000 1950e700 10 mds.1.locker scatter_eval (inest lock) on [inode 0x60b [...2,head] ~mds1/stray1/ auth v56 ap=1 DIRTYPARENT f(v0 m2022-12-21T15:08:32.520905+0000 2=2+0) n(v0 rc2022-12-21T15:08:32.520905+0000 b4194304 3=2+1) (inest lock) (iversion lock) | lock=0 dirfrag=1 stickydirs=1 stray=1 dirtyparent=1 dirty=1 authpin=1 0xf475be0] 2022-12-21T15:08:32.661+0000 1950e700 10 mds.1.locker xlock_finish on (dn xlockdone l x=1) [dentry #0x1/client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/c24 [2,head] auth NULL (dn xlockdone l x=1) (dversion lock w=1 last_client=4536) v=350 ap=2 ino=(nil) state=1610612736 | request=1 lock=2 inodepin=0 replicated=0 dirty=1 authpin=1 clientlease=1 0x1acdffd0] 2022-12-21T15:08:32.661+0000 1950e700 10 mds.1.locker eval_gather (dn xlockdone l) on [dentry #0x1/client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/c24 [2,head] auth NULL (dn xlockdone l) (dversion lock w=1 last_client=4536) v=350 ap=2 ino=(nil) state=1610612736 | request=1 lock=1 inodepin=0 replicated=0 dirty=1 authpin=1 clientlease=1 0x1acdffd0] 2022-12-21T15:08:32.661+0000 1950e700 7 mds.1.locker eval_gather finished gather on (dn xlockdone l) on [dentry #0x1/client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/c24 [2,head] auth NULL (dn xlockdone l) (dversion lock w=1 last_client=4536) v=350 ap=2 ino=(nil) state=1610612736 | request=1 lock=1 inodepin=0 replicated=0 dirty=1 authpin=1 clientlease=1 0x1acdffd0] 2022-12-21T15:08:32.662+0000 1950e700 10 mds.1.cache.den(0x40000000206 c24) auth_unpin by 0x1ace00b0 on [dentry #0x1/client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/c24 [2,head] auth NULL (dn sync l) (dversion lock w=1 last_client=4536) v=350 ap=1 ino=(nil) state=1610612736 | request=1 lock=1 inodepin=0 replicated=0 dirty=1 authpin=1 clientlease=1 0x1acdffd0] now 1 2022-12-21T15:08:32.662+0000 1950e700 15 mds.1.cache.dir(0x40000000206) adjust_nested_auth_pins -1 on [dir 0x40000000206 /client.0/tmp/tmp.UQxwie74AM/p5/d0/d2/d14/d37/ [2,head] auth{0=1,3=1} v=351 cv=0/0 dir_auth=1 ap=1+1 state=1610874881|complete f(v0 m2022-12-21T15:08:32.520905+0000 11=8+3)/f(v0 m2022-12-21T15:08:10.789215+0000 12=9+3) n(v12 rc2022-12-21T15:08:32.520905+0000 b31204703 35=26+9)/n(v12 rc2022-12-21T15:08:23.108040+0000 b30920189 35=27+8) hs=11+4,ss=0+0 dirty=15 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x1cd27040] by 0x1ace00b0 count now 1/1
Updated by Xiubo Li about 1 year ago
- Status changed from Triaged to Fix Under Review
- Assignee set to Xiubo Li
Updated by Venky Shankar about 1 year ago
- Related to Bug #59134: mds: deadlock during unlink with multimds (postgres) added
Updated by Xiubo Li 12 months ago
2023-02-10T05:33:35.813+0000 7f3673425223 0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 723.3245213 secs 2023-02-10T05:33:35.813+0000 7f3673425223 0 log_channel(cluster) log [WRN] : slow request 723.32452 seconds old, received at 2023-04-19T18:24:02.644336+0000: client_request(mds.0:343992 rename #0x20000008f81/posd #0x629/3000034765e caller_uid=0, caller_gid=0{}) currently acquired locks 2023-02-10T05:33:35.813+0000 7f367345223 0 log_channel(cluster) log [WRN] : slow request 723.34452 seconds old, received at 2023-04-19T18:24:02.645006+0000: client_request(client.252712820:58191 unlink #0x20000008f81/posd 2023-04-19T18:24:02.644807+0000 caller_uid=0, caller_gid=0{}) currently requesting remote authpins
This is because when one file has hard links and when the first hard link was unlinked, it will remove the corresponding dentry and then try to move the corresponding CInode to stray dir. But since the nlink != 0 so it will try to reintergrated the stray dentry with another hardlink in the same MDS or remote MDS, which it will send a rename request to that MDS.
So after that if a client tries to unlink another hard link it will cause deadlock.
Updated by Kotresh Hiremath Ravishankar 11 months ago
Updated by Kotresh Hiremath Ravishankar 11 months ago
Updated by Patrick Donnelly 11 months ago
- Status changed from Fix Under Review to Pending Backport
- Target version changed from v18.0.0 to v19.0.0
- Backport changed from pacific,quincy to reef,pacific,quincy
Updated by Backport Bot 11 months ago
- Copied to Backport #61346: pacific: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
Updated by Backport Bot 11 months ago
- Copied to Backport #61347: reef: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
Updated by Backport Bot 11 months ago
- Copied to Backport #61348: quincy: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
Updated by Kotresh Hiremath Ravishankar 11 months ago
Updated by Kotresh Hiremath Ravishankar 11 months ago
Updated by Patrick Donnelly 8 months ago
- Related to Bug #62096: mds: infinite rename recursion on itself added
Updated by Patrick Donnelly 8 months ago
- Related to Bug #61818: mds: deadlock between unlink and linkmerge added
Updated by Patrick Donnelly 8 months ago
- Related to Bug #56695: [RHEL stock] pjd test failures(a bug that need to wait the unlink to finish) added
Actions