Bug #62096
closedmds: infinite rename recursion on itself
0%
Description
I don't have an explanation for why PQputline failed specifically but apparently we hit some new (possible) deadlock:
2023-07-14T11:17:03.043 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: 973 slow requests, 5 included below; oldest blocked for > 183.521234 secs 2023-07-14T11:17:03.043 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.221582 seconds old, received at 2023-07-14T11:13:58.402232+0000: client_request(mds.1:948 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting 2023-07-14T11:17:03.043 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.190394 seconds old, received at 2023-07-14T11:13:58.433419+0000: client_request(mds.1:980 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting 2023-07-14T11:17:03.043 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.147883 seconds old, received at 2023-07-14T11:13:58.475930+0000: client_request(mds.1:1012 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting 2023-07-14T11:17:03.044 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.114251 seconds old, received at 2023-07-14T11:13:58.509562+0000: client_request(mds.1:1044 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting 2023-07-14T11:17:03.044 INFO:journalctl@ceph.mon.c.smithi130.stdout:Jul 14 11:17:02 smithi130 ceph-mon[129405]: slow request 183.081030 seconds old, received at 2023-07-14T11:13:58.542783+0000: client_request(mds.1:1076 rename #0x10000000002/0000000100000000000000E1 #0x60b/100000005f2 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting
There's no evidence of metadata corruption (tracker 54546).
Updated by Xiubo Li 9 months ago
Patrick,
This should be the same issue with:
https://tracker.ceph.com/issues/58340
https://tracker.ceph.com/issues/61818
Updated by Patrick Donnelly 9 months ago
- Related to Bug #58340: mds: fsstress.sh hangs with multimds (deadlock between unlink and reintegrate straydn(rename)) added
Updated by Patrick Donnelly 9 months ago
- Related to Bug #61818: mds: deadlock between unlink and linkmerge added
Updated by Patrick Donnelly 9 months ago
Xiubo Li wrote:
Patrick,
This should be the same issue with:
https://tracker.ceph.com/issues/58340
https://tracker.ceph.com/issues/61818
Hi Xiubo, AFAICT there was no actual deadlock. I think it's a slowdown caused by thousands of rename ops for a single stray migration. The cost of acquiring the locks is quite high which means it takes a long time for those ops to unwind (ENOENT because the first one succeeds).
Updated by Xiubo Li 9 months ago
Patrick Donnelly wrote:
Xiubo Li wrote:
Patrick,
This should be the same issue with:
https://tracker.ceph.com/issues/58340
https://tracker.ceph.com/issues/61818Hi Xiubo, AFAICT there was no actual deadlock. I think it's a slowdown caused by thousands of rename ops for a single stray migration. The cost of acquiring the locks is quite high which means it takes a long time for those ops to unwind (ENOENT because the first one succeeds).
Okay. As I remembered when the unlink and linkmerge are deadlock, I can see a lot of rename requests recursively.
Updated by Patrick Donnelly 8 months ago
- Related to Bug #62702: MDS slow requests for the internal 'rename' requests added
Updated by Patrick Donnelly 7 months ago
- Status changed from In Progress to Duplicate
Updated by Patrick Donnelly 7 months ago
- Related to deleted (Bug #62702: MDS slow requests for the internal 'rename' requests)
Updated by Patrick Donnelly 7 months ago
- Is duplicate of Bug #62702: MDS slow requests for the internal 'rename' requests added