Project

General

Profile

Actions

Bug #62096

closed

mds: infinite rename recursion on itself

Added by Patrick Donnelly 10 months ago. Updated 7 months ago.

Status:
Duplicate
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/rishabh-2023-07-14_10:26:42-fs-wip-rishabh-2023Jul13-testing-default-smithi/7337403

I don't have an explanation for why PQputline failed specifically but apparently we hit some new (possible) deadlock:

2023-07-14T11:17:03.043 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: 973 slow requests, 5 included below; oldest blocked for > 183.521234 secs
2023-07-14T11:17:03.043 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.221582 seconds old, received at 2023-07-14T11:13:58.402232+0000: client_request(mds.1:948 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting                                
2023-07-14T11:17:03.043 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.190394 seconds old, received at 2023-07-14T11:13:58.433419+0000: client_request(mds.1:980 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting                                
2023-07-14T11:17:03.043 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.147883 seconds old, received at 2023-07-14T11:13:58.475930+0000: client_request(mds.1:1012 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting                                
2023-07-14T11:17:03.044 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.114251 seconds old, received at 2023-07-14T11:13:58.509562+0000: client_request(mds.1:1044 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting                                
2023-07-14T11:17:03.044 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.081030 seconds old, received at 2023-07-14T11:13:58.542783+0000: client_request(mds.1:1076 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting   

There's no evidence of metadata corruption (tracker 54546).


Related issues 3 (2 open1 closed)

Related to CephFS - Bug #58340: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename))ResolvedXiubo Li

Actions
Related to CephFS - Bug #61818: mds: deadlock between unlink and linkmergePending BackportXiubo Li

Actions
Is duplicate of CephFS - Bug #62702: MDS slow requests for the internal 'rename' requestsPending BackportXiubo Li

Actions
Actions

Also available in: Atom PDF