Project

General

Profile

Actions

Bug #62702

open

MDS slow requests for the internal 'rename' requests

Added by Xiubo Li 8 months ago. Updated 7 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/rishabh-2023-08-25_01:50:32-fs-wip-rishabh-2023aug3-b5-testing-default-smithi/7378922

2023-08-25T02:31:51.580 INFO:journalctl@ceph.mon.a.smithi089.stdout:Aug 25 02:31:51 smithi089 ceph-mon[98732]: 1204 slow requests, 5 included below; oldest blocked for > 234.279466 secs
2023-08-25T02:31:51.580 INFO:journalctl@ceph.mon.a.smithi089.stdout:Aug 25 02:31:51 smithi089 ceph-mon[98732]: slow request 232.453933 seconds old, received at 2023-08-25T02:27:57.446390+0000: client_request(mds.1:2987 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting
2023-08-25T02:31:51.581 INFO:journalctl@ceph.mon.a.smithi089.stdout:Aug 25 02:31:51 smithi089 ceph-mon[98732]: slow request 232.423584 seconds old, received at 2023-08-25T02:27:57.476740+0000: client_request(mds.1:3019 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting
2023-08-25T02:31:51.581 INFO:journalctl@ceph.mon.a.smithi089.stdout:Aug 25 02:31:51 smithi089 ceph-mon[98732]: slow request 232.393722 seconds old, received at 2023-08-25T02:27:57.506602+0000: client_request(mds.1:3051 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting
2023-08-25T02:31:51.581 INFO:journalctl@ceph.mon.a.smithi089.stdout:Aug 25 02:31:51 smithi089 ceph-mon[98732]: slow request 232.364995 seconds old, received at 2023-08-25T02:27:57.535329+0000: client_request(mds.1:3083 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting
2023-08-25T02:31:51.581 INFO:journalctl@ceph.mon.a.smithi089.stdout:Aug 25 02:31:51 smithi089 ceph-mon[98732]: slow request 232.335887 seconds old, received at 2023-08-25T02:27:57.564437+0000: client_request(mds.1:3115 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting

This will happen in case:

1, CInodeA has only one link, which the nlink == 1
2, a link request comes and does the early reply and then submit a journal log entry, but not being flushed until after step 3 below
3, a unlink request comes and does the early reply and then submit a journal log entry.
4, both the journal log entries are flushed
5, a stray dentry reintegration will be fired and then it will send a rename request to itself.
6, because the new dentry from the link request is a remote dentry, so the rename request when traversing the path or getting the dstdn inode it will call dn->link_remote().
7, the dn->link_remote() will fire a new stray dentry reintegration to itself again.
8, usually the rename request will be retried for several time and each time it will fire a new stray dentry reintegration, which make no sense and the redundant stray dentry reintegration will fail later anyway or maybe stuck for a long time, which will cause the slow request warning though they will disappear later.

For exmaple:

The link request came:

2023-08-25T02:26:50.010+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== client.15222 192.168.0.1:0/2337416110 16 ==== client_request(client.15222:1976 link #0x10000000002/000000010000000000000027 #0x10000000002/xlogtemp.148674 2023-08-25T02:26:50.002793+0000 FWD=1 caller_uid=26, caller_gid=26{26,}) v4 ==== 227+0+0 (crc 0 0 0) 0x55923df8cc00 con 0x55923cde3000

And then later it early_reply and then submit the journal log event:

2023-08-25T02:26:50.016+0000 7fd0353a5700 10 mds.1.server early_reply 0 ((0) Success) client_request(client.15222:1976 link #0x10000000002/000000010000000000000027 #0x10000000002/xlogtemp.148674 2023-08-25T02:26:50.002793+0000 FWD=1 caller_uid=26, caller_gid=26{26,}) v4
...
2023-08-25T02:26:50.016+0000 7fd0353a5700 20 mds.1.log _submit_entry EUpdate link_local [metablob 0x10000000002, 1 dirs]
2023-08-25T02:26:50.016+0000 7fd0353a5700  7 mds.1.locker rdlock_finish on (dn sync) on [dentry #0x1/postgres/data/pg_wal/xlogtemp.148674 [2,head] auth{0=1} (dversion lock) pv=1040 v=1037 ap=1 ino=0x1000000053c state=1610612736 | lock=0 inodepin=1 replicated=1 dirty=1 authpin=1 0x55923dfc1900]
2023-08-25T02:26:50.016+0000 7fd0353a5700 10 mds.1.locker simple_eval (dn sync) on [dentry #0x1/postgres/data/pg_wal/xlogtemp.148674 [2,head] auth{0=1} (dversion lock) pv=1040 v=1037 ap=1 ino=0x1000000053c state=1610612736 | lock=0 inodepin=1 replicated=1 dirty=1 authpin=1 0x55923dfc1900]
2023-08-25T02:26:50.016+0000 7fd02eb98700  5 mds.1.log _submit_thread 4269567~1316 : EUpdate link_local [metablob 0x10000000002, 1 dirs]

Then the unlink request came:

2023-08-25T02:26:50.016+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== client.15222 192.168.0.1:0/2337416110 18 ==== client_request(client.15222:1978 unlink #0x10000000002/xlogtemp.148674 2023-08-25T02:26:50.017793+0000 caller_uid=26, caller_gid=26{26,}) v4 ==== 203+0+0 (crc 0 0 0) 0x55923dcfe300 con 0x55923cde3000
2023-08-25T02:26:50.016+0000 7fd0353a5700  4 mds.1.server handle_client_request client_request(client.15222:1978 unlink #0x10000000002/xlogtemp.148674 2023-08-25T02:26:50.017793+0000 caller_uid=26, caller_gid=26{26,}) v4
2023-08-25T02:26:50.016+0000 7fd0353a5700 20 mds.1.18 get_session have 0x55923dd63400 client.15222 192.168.0.1:0/2337416110 state open

And then later it early_reply and then submit the journal log event:

2023-08-25T02:26:50.018+0000 7fd0353a5700 10 mds.1.server early_reply 0 ((0) Success) client_request(client.15222:1978 unlink #0x10000000002/xlogtemp.148674 2023-08-25T02:26:50.017793+0000 caller_uid=26, caller_gid=26{26,}) v4
2023-08-25T02:26:50.018+0000 7fd0353a5700 20 mds.1.server set_trace_dist snapid head
...
2023-08-25T02:26:50.018+0000 7fd0353a5700 20 mds.1.log _submit_entry EUpdate unlink_local [metablob 0x10000000002, 3 dirs]
2023-08-25T02:26:50.018+0000 7fd02eb98700  5 mds.1.log _submit_thread 4270903~2514 : EUpdate unlink_local [metablob 0x10000000002, 3 dirs]

Then the journal logs are flushed and the reintegration was fired:

2023-08-25T02:26:50.021+0000 7fd02f399700 10 MDSIOContextBase::complete: 13C_MDL_Flushed
2023-08-25T02:26:50.021+0000 7fd02f399700 10 MDSContext::complete: 13C_MDL_Flushed
2023-08-25T02:26:50.021+0000 7fd02f399700 10 MDSIOContextBase::complete: 23C_MDS_link_local_finish
2023-08-25T02:26:50.021+0000 7fd02f399700 10 MDSContext::complete: 23C_MDS_link_local_finish
2023-08-25T02:26:50.021+0000 7fd02f399700 10 mds.1.server _link_local_finish [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} NULL (dn xlockdone l x=1) (dversion lock w=1 last_client=15222) pv=1039 v=1026 ap=2 ino=(nil) state=1073741824 | request=1 lock=2 replicated=1 waiter=0 authpin=1 clientlease=1 0x55923decb680] to [inode 0x1000000053c [2,head] ~mds1/stray2/1000000053c auth{0=1} v1037 pv1041 ap=4 DIRTYPARENT s=16777216 n(v0 rc2023-08-25T02:26:49.157812+0000 b16777216 1=1+0)->n(v0 rc2023-08-25T02:26:50.017793+0000 b16777216 1=1+0) (iauth excl) (ilink xlockdone x=2) (ifile excl) (ixattr excl) (iversion lock w=2 last_client=15222) cr={15222=0-37748736@1} caps={15222=pAsxXsxFsxcrwb/pAsxXsxFsxcrwb@4},l=15222 | ptrwaiter=0 request=2 lock=5 importingcaps=0 caps=1 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=1 0x55923dfc5b80]
2023-08-25T02:26:50.021+0000 7fd02f399700 12 mds.1.cache.dir(0x10000000002) link_remote_inode [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} NULL (dn xlockdone l x=1) (dversion lock w=1 last_client=15222) pv=1039 v=1026 ap=2 ino=(nil) state=1073741824 | request=1 lock=2 replicated=1 waiter=0 authpin=1 clientlease=1 0x55923decb680] remote 0x1000000053c
2023-08-25T02:26:50.021+0000 7fd02f399700 10 mds.1.cache.strays eval_remote [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} REMOTE(reg) (dn xlockdone l x=1) (dversion lock w=1 last_client=15222) pv=1039 v=1026 ap=2 ino=0x1000000053c state=1073741824 | request=1 lock=2 replicated=1 waiter=0 authpin=1 clientlease=1 0x55923decb680]
2023-08-25T02:26:50.021+0000 7fd02f399700 20 mds.1.cache.strays _eval_stray_remote [dentry #0x101/stray2/1000000053c [2,head] auth NULL (dn xlockdone x=1) (dversion lock w=1 last_client=15222) pv=1041 v=1 ap=2 ino=(nil) state=1342177281|new | request=1 lock=2 authpin=1 0x55923e074c80]
2023-08-25T02:26:50.021+0000 7fd02f399700 10 mds.1.cache.strays reintegrate_stray [dentry #0x101/stray2/1000000053c [2,head] auth NULL (dn xlockdone x=1) (dversion lock w=1 last_client=15222) pv=1041 v=1 ap=2 ino=(nil) state=1342177281|new | request=1 lock=2 authpin=1 0x55923e074c80] to [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} REMOTE(reg) (dn xlockdone l x=1) (dversion lock w=1 last_client=15222) pv=1039 v=1026 ap=2 ino=0x1000000053c state=1073741824 | request=1 lock=2 replicated=1 waiter=0 authpin=1 clientlease=1 0x55923decb680]
2023-08-25T02:26:50.021+0000 7fd02f399700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] send_to--> mds [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] -- client_request(unknown.0:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 -- ?+0 0x55923dfb4000
2023-08-25T02:26:50.021+0000 7fd02f399700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] --> [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] -- client_request(unknown.0:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 -- 0x55923dfb4000 con 0x55923cde2000
2023-08-25T02:26:50.021+0000 7fd038bac700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== osd.5 v2:172.21.15.122:6808/1949734782 17 ==== osd_op_reply(45 201.00000001 [write 83215~6552] v83'282 uv282 ondisk = 0) v8 ==== 156+0+0 (crc 0 0 0) 0x55923de1a6c0 con 0x55923ddb2c00
2023-08-25T02:26:50.021+0000 7fd02f399700 10 mds.1.cache.den(0x10000000002 000000010000000000000027) mark_dirty [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} REMOTE(reg) (dn xlockdone l x=1) (dversion lock w=1 last_client=15222) pv=1039 v=1026 ap=2 ino=0x1000000053c state=1073742080|reintegrating | request=1 lock=2 purging=1 replicated=1 waiter=0 authpin=1 clientlease=1 0x55923decb680]
2023-08-25T02:26:50.021+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923dfb4000 con 0x55923cde2000

But the rename request was retried for several times:

2023-08-25T02:26:50.022+0000 7fd0353a5700  4 mds.1.server handle_client_request client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.022+0000 7fd0353a5700  7 mds.1.cache request_start request(mds.1:1 nref=2 cr=0x55923dfb4000)
2023-08-25T02:26:50.022+0000 7fd0353a5700  7 mds.1.server dispatch_client_request client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.022+0000 7fd0353a5700  7 mds.1.server handle_client_rename client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
...
2023-08-25T02:26:50.022+0000 7fd02f399700 10 mds.1.cache.strays eval_remote [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} REMOTE(reg) (dn sync l) (dversion lock) v=1039 ino=0x1000000053c state=1610612992|reintegrating | request=0 lock=0 purging=1 replicated=1 dirty=1 waiter=0 authpin=0 clientlease=1 0x55923decb680]
2023-08-25T02:26:50.022+0000 7fd02f399700 20 mds.1.cache.strays _eval_stray_remote [dentry #0x101/stray2/1000000053c [2,head] auth NULL (dn xlockdone x=1) (dversion lock w=1 last_client=15222) pv=1041 v=1 ap=2 ino=(nil) state=1342177281|new | request=1 lock=2 authpin=1 0x55923e074c80]
2023-08-25T02:26:50.022+0000 7fd02f399700 10 mds.1.cache.strays reintegrate_stray [dentry #0x101/stray2/1000000053c [2,head] auth NULL (dn xlockdone x=1) (dversion lock w=1 last_client=15222) pv=1041 v=1 ap=2 ino=(nil) state=1342177281|new | request=1 lock=2 authpin=1 0x55923e074c80] to [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} REMOTE(reg) (dn sync l) (dversion lock) v=1039 ino=0x1000000053c state=1610612992|reintegrating | request=0 lock=0 purging=1 replicated=1 dirty=1 waiter=0 authpin=0 clientlease=1 0x55923decb680]
2023-08-25T02:26:50.022+0000 7fd02f399700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] send_to--> mds [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] -- client_request(unknown.0:2 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 -- ?+0 0x55923dfb4300
2023-08-25T02:26:50.022+0000 7fd02f399700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] --> [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] -- client_request(unknown.0:2 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 -- 0x55923dfb4300 con 0x55923cde2000
2023-08-25T02:26:50.022+0000 7fd02f399700 10 MDSIOContextBase::complete: 25C_MDS_unlink_local_finish
2023-08-25T02:26:50.022+0000 7fd0353a5700  4 mds.1.server handle_client_request client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.022+0000 7fd0353a5700  7 mds.1.cache request_start request(mds.1:1 nref=2 cr=0x55923dfb4000)
2023-08-25T02:26:50.022+0000 7fd0353a5700  7 mds.1.server dispatch_client_request client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.022+0000 7fd0353a5700  7 mds.1.server handle_client_rename client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.022+0000 7fd0353a5700 10 mds.1.server rdlock_two_paths_xlock_destdn request(mds.1:1 nref=2 cr=0x55923dfb4000) #0x10000000002/000000010000000000000027 #0x60c/1000000053c
2023-08-25T02:26:50.022+0000 7fd0353a5700  7 mds.1.cache traverse: opening base ino 0x10000000002 snap head
...
2023-08-25T02:26:50.023+0000 7fd02f399700 10 MDSContext::complete: 18C_MDS_RetryRequest
2023-08-25T02:26:50.023+0000 7fd02f399700  7 mds.1.server dispatch_client_request client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.023+0000 7fd02f399700  7 mds.1.server handle_client_rename client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.023+0000 7fd02f399700 10 mds.1.server rdlock_two_paths_xlock_destdn request(mds.1:1 nref=2 cr=0x55923dfb4000) #0x10000000002/000000010000000000000027 #0x60c/1000000053c
2023-08-25T02:26:50.023+0000 7fd02f399700  7 mds.1.cache traverse: opening base ino 0x10000000002 snap head
...
2023-08-25T02:26:50.023+0000 7fd02f399700 10 mds.1.cache.strays eval_stray [dentry #0x101/stray2/1000000053c [2,head] auth{0=1} (dversion lock) v=1041 ap=1 ino=0x1000000053c state=1879048209|new | request=0 lock=0 inodepin=1 replicated=1 dirty=1 waiter=0 authpin=1 0x55923e074c80]
2023-08-25T02:26:50.023+0000 7fd02f399700 10 mds.1.cache.strays  inode is [inode 0x1000000053c [...2,head] ~mds1/stray2/1000000053c auth{0=2} v1041 snaprealm=0x55923de1ab40 DIRTYPARENT s=16777216 n(v0 rc2023-08-25T02:26:50.017793+0000 b16777216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={15222=0-37748736@1} caps={15222=pAsxXsxFsxcrwb/pAsxXsxFsxcrwb@4},l=15222 | ptrwaiter=0 request=0 lock=0 importingcaps=0 caps=1 remoteparent=1 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=0 0x55923dfc5b80]
2023-08-25T02:26:50.023+0000 7fd02f399700 20 mds.1.cache.strays _eval_stray_remote [dentry #0x101/stray2/1000000053c [2,head] auth{0=1} (dversion lock) v=1041 ap=1 ino=0x1000000053c state=1879048209|new | request=0 lock=0 inodepin=1 replicated=1 dirty=1 waiter=0 authpin=1 0x55923e074c80]
2023-08-25T02:26:50.023+0000 7fd02f399700 10 mds.1.cache.strays reintegrate_stray [dentry #0x101/stray2/1000000053c [2,head] auth{0=1} (dversion lock) v=1041 ap=1 ino=0x1000000053c state=1879048209|new | request=0 lock=0 inodepin=1 replicated=1 dirty=1 waiter=0 authpin=1 0x55923e074c80] to [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} REMOTE(reg) (dn sync->lock g=0 l) (dversion lock) v=1039 ap=2 ino=0x1000000053c state=1610612992|reintegrating | request=0 lock=0 purging=2 replicated=1 dirty=1 waiter=1 authpin=1 clientlease=1 0x55923decb680]
2023-08-25T02:26:50.023+0000 7fd02f399700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] send_to--> mds [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] -- client_request(unknown.0:3 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 -- ?+0 0x55923dfb4600
2023-08-25T02:26:50.023+0000 7fd02f399700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] --> [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] -- client_request(unknown.0:3 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 -- 0x55923dfb4600 con 0x55923cde2000
2023-08-25T02:26:50.023+0000 7fd02f399700 10 MDSIOContextBase::complete: 26C_MDS_ImportDirLoggedStart
2023-08-25T02:26:50.023+0000 7fd0353a5700 20 mds.1.18 get_session have 0x55923dd63400 client.15222 192.168.0.1:0/2337416110 state open
2023-08-25T02:26:50.023+0000 7fd0353a5700  3 mds.1.server handle_client_session client_session(request_flushmdlog seq 104) from client.15222
2023-08-25T02:26:50.023+0000 7fd02f399700 10 MDSContext::complete: 26C_MDS_ImportDirLoggedStart
2023-08-25T02:26:50.023+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:2 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923dfb4300 con 0x55923cde2000
...
2023-08-25T02:26:50.024+0000 7fd0353a5700 10 MDSContext::complete: 18C_MDS_RetryRequest
2023-08-25T02:26:50.024+0000 7fd0353a5700  7 mds.1.server dispatch_client_request client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.024+0000 7fd0353a5700  7 mds.1.server handle_client_rename client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.024+0000 7fd0353a5700 10 mds.1.server rdlock_two_paths_xlock_destdn request(mds.1:1 nref=2 cr=0x55923dfb4000) #0x10000000002/000000010000000000000027 #0x60c/1000000053c
...
2023-08-25T02:26:50.025+0000 7fd0353a5700 10 MDSContext::complete: 18C_MDS_RetryRequest
2023-08-25T02:26:50.025+0000 7fd0353a5700  7 mds.1.server dispatch_client_request client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.025+0000 7fd0353a5700  7 mds.1.server handle_client_rename client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.025+0000 7fd0353a5700 10 mds.1.server rdlock_two_paths_xlock_destdn request(mds.1:1 nref=3 cr=0x55923dfb4000) #0x10000000002/000000010000000000000027 #0x60c/1000000053c
...
2023-08-25T02:26:50.025+0000 7fd0353a5700 10 mds.1.cache.strays eval_remote [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} REMOTE(reg) (dn xlock x=1 by 0x55923df91600) (dversion lock w=1) v=1039 ap=3 ino=0x1000000053c state=1610612992|reintegrating | request=1 lock=2 purging=3 replicated=1 dirty=1 waiter=1 authpin=1 clientlease=0 0x55923decb680]
2023-08-25T02:26:50.025+0000 7fd0353a5700 20 mds.1.cache.strays _eval_stray_remote [dentry #0x101/stray2/1000000053c [2,head] auth{0=1} (dn xlock x=1 by 0x55923df91600) (dversion lock w=1) v=1041 ap=2 ino=0x1000000053c state=1879048193|new | request=1 lock=2 inodepin=1 replicated=1 dirty=1 waiter=0 authpin=1 0x55923e074c80]
2023-08-25T02:26:50.025+0000 7fd0353a5700 10 mds.1.cache.strays reintegrate_stray [dentry #0x101/stray2/1000000053c [2,head] auth{0=1} (dn xlock x=1 by 0x55923df91600) (dversion lock w=1) v=1041 ap=2 ino=0x1000000053c state=1879048193|new | request=1 lock=2 inodepin=1 replicated=1 dirty=1 waiter=0 authpin=1 0x55923e074c80] to [dentry #0x1/postgres/data/pg_wal/000000010000000000000027 [2,head] auth{0=1} REMOTE(reg) (dn xlock x=1 by 0x55923df91600) (dversion lock w=1) v=1039 ap=3 ino=0x1000000053c state=1610612992|reintegrating | request=1 lock=2 purging=3 replicated=1 dirty=1 waiter=1 authpin=1 clientlease=0 0x55923decb680]
2023-08-25T02:26:50.025+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] send_to--> mds [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] -- client_request(unknown.0:4 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 -- ?+0 0x55923df8d800
2023-08-25T02:26:50.025+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] --> [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] -- client_request(unknown.0:4 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 -- 0x55923df8d800 con 0x55923cde2000
...
[xiubli@vossi04 7378922]$ zgrep 'rename #0x10000000002/000000010000000000000027' remote/smithi122/log/7eb69934-42ed-11ee-9b3d-001a4aab830c/ceph-mds.i.log.gz |grep '==== client_request\|) client_request'
2023-08-25T02:26:50.021+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923dfb4000 con 0x55923cde2000
2023-08-25T02:26:50.023+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:2 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923dfb4300 con 0x55923cde2000
2023-08-25T02:26:50.024+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:3 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923dfb4600 con 0x55923cde2000
2023-08-25T02:26:50.025+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:4 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923df8d800 con 0x55923cde2000
2023-08-25T02:26:50.025+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:5 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923df8db00 con 0x55923cde2000
2023-08-25T02:26:50.026+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:6 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923dfb4900 con 0x55923cde2000
2023-08-25T02:26:50.032+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:7 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923dfb4f00 con 0x55923cde2000
2023-08-25T02:26:50.034+0000 7fd02f399700  7 mds.1.server reply_client_request 0 ((0) Success) client_request(mds.1:1 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.040+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:2 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.043+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:3 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.046+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:4 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.047+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:5 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.048+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:6 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:26:50.052+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:7 rename #0x10000000002/000000010000000000000027 #0x60c/1000000053c caller_uid=0, caller_gid=0{}) v6

Another case, there are 2208 rename requests:

[xiubli@vossi04 7378922]$ zgrep 'rename #0x10000000002/000000010000000100000007' remote/smithi122/log/7eb69934-42ed-11ee-9b3d-001a4aab830c/ceph-mds.i.log.gz |grep '==== client_request'| wc -l
2208

More detail:

[xiubli@vossi04 7378922]$ zgrep 'rename #0x10000000002/000000010000000100000007' remote/smithi122/log/7eb69934-42ed-11ee-9b3d-001a4aab830c/ceph-mds.i.log.gz |grep '==== client_request\|) client_request'
2023-08-25T02:27:55.558+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:1078 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923e25ac00 con 0x55923cde2000
2023-08-25T02:27:55.559+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:1079 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923e01b200 con 0x55923cde2000
2023-08-25T02:27:55.565+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:1080 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923e63f500 con 0x55923cde2000
2023-08-25T02:27:55.566+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:1081 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923e258000 con 0x55923cde2000
2023-08-25T02:27:55.568+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:1082 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923e63f200 con 0x55923cde2000
2023-08-25T02:27:55.569+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:1083 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923dffb800 con 0x55923cde2000
...
2023-08-25T02:27:57.719+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:3278 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923ee14c00 con 0x55923cde2000
2023-08-25T02:27:57.720+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:3279 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923ee14f00 con 0x55923cde2000
2023-08-25T02:27:57.722+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:3280 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923ee15200 con 0x55923cde2000
2023-08-25T02:27:57.724+0000 7fd0353a5700  1 -- [v2:172.21.15.122:6838/3754052688,v1:172.21.15.122:6839/3754052688] <== mds.1 v2:172.21.15.122:6838/3754052688 0 ==== client_request(mds.1:3281 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6 ==== 0+0+0 (unknown 0 0 0) 0x55923ee15800 con 0x55923cde2000
2023-08-25T02:27:58.149+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:3280 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:27:58.302+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:3281 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:27:58.455+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:1078 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:27:58.613+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:1079 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6
...
2023-08-25T02:33:36.600+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:1645 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:33:37.067+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:1646 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:33:37.539+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:1647 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:33:38.022+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:1648 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6
2023-08-25T02:33:39.037+0000 7fd0353a5700  7 mds.1.server reply_client_request -2 ((2) No such file or directory) client_request(mds.1:1649 rename #0x10000000002/000000010000000100000007 #0x60c/1000000051f caller_uid=0, caller_gid=0{}) v6

Related issues 4 (2 open2 closed)

Related to CephFS - Backport #63513: pacific: MDS slow requests for the internal 'rename' requestsResolvedXiubo LiActions
Has duplicate CephFS - Bug #62096: mds: infinite rename recursion on itselfDuplicatePatrick Donnelly

Actions
Copied to CephFS - Backport #63261: quincy: MDS slow requests for the internal 'rename' requestsNewXiubo LiActions
Copied to CephFS - Backport #63262: reef: MDS slow requests for the internal 'rename' requestsIn ProgressXiubo LiActions
Actions #1

Updated by Xiubo Li 8 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 53280
Actions #2

Updated by Xiubo Li 8 months ago

  • Description updated (diff)
Actions #3

Updated by Patrick Donnelly 8 months ago

  • Related to Bug #62096: mds: infinite rename recursion on itself added
Actions #4

Updated by Venky Shankar 7 months ago

  • Category set to Administration/Usability
  • Status changed from Fix Under Review to Pending Backport
  • Target version set to v19.0.0
  • Backport set to reef,quincy
  • Component(FS) MDS added
Actions #5

Updated by Backport Bot 7 months ago

  • Copied to Backport #63261: quincy: MDS slow requests for the internal 'rename' requests added
Actions #6

Updated by Backport Bot 7 months ago

  • Copied to Backport #63262: reef: MDS slow requests for the internal 'rename' requests added
Actions #7

Updated by Backport Bot 7 months ago

  • Tags set to backport_processed
Actions #8

Updated by Patrick Donnelly 7 months ago

  • Related to deleted (Bug #62096: mds: infinite rename recursion on itself)
Actions #9

Updated by Patrick Donnelly 7 months ago

  • Has duplicate Bug #62096: mds: infinite rename recursion on itself added
Actions #10

Updated by Xiubo Li 6 months ago

  • Related to Backport #63513: pacific: MDS slow requests for the internal 'rename' requests added
Actions

Also available in: Atom PDF