Project

General

Profile

Bug #51734

kcient: the delayed_work() send too many client_caps requests one time and will delay some rmdir requests

Added by Xiubo Li over 2 years ago. Updated about 2 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I have enable the kernel debug logs and hit one case that, the rmdir request was issued immediatelly just after the lookup returned. But the rmdir request was delayed and took a long time:

/root/removalC.log:unlinkat(AT_FDCWD, "removalC.test01821", AT_REMOVEDIR) = 0 <2.104413>
/root/removalC.log:unlinkat(AT_FDCWD, "removalC.test01853", AT_REMOVEDIR) = 0 <2.732223>
/root/removalC.log:unlinkat(AT_FDCWD, "removalC.test01869", AT_REMOVEDIR) = 0 <2.064317>
/root/removalC.log:unlinkat(AT_FDCWD, "removalC.test01891", AT_REMOVEDIR) = 0 <2.556657>
/root/removalC.log:unlinkat(AT_FDCWD, "removalC.test01932", AT_REMOVEDIR) = 0 <2.354984>
/root/removalB.log:unlinkat(AT_FDCWD, "removalB.test01740", AT_REMOVEDIR) = 0 <3.605535>
/root/removalB.log:unlinkat(AT_FDCWD, "removalB.test01964", AT_REMOVEDIR) = 0 <3.124542>
/root/removalC.log:unlinkat(AT_FDCWD, "removalC.test01114", AT_REMOVEDIR) = 0 <3.952824>
/root/removalA.log:unlinkat(AT_FDCWD, "removalA.test01646", AT_REMOVEDIR) = 0 <4.890782>
/root/removalA.log:unlinkat(AT_FDCWD, "removalA.test00484", AT_REMOVEDIR) = 0 <17.238886>
/root/removalB.log:unlinkat(AT_FDCWD, "removalB.test00484", AT_REMOVEDIR) = 0 <17.247903>

In the mds.a.log, we can see that the MDS messager received the request in time:

2021-07-19 20:18:15.993 14e5d79eb700 5 --1- v1:10.72.47.117:6817/4172579165 >> v1:10.72.47.117:0/267216858 conn(0x55e4e97f7680 0x55e4e973d000 :6817 s=READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=0). rx client.4275 seq 31878 0x55e4ea1f6300 client_request(client.4275:26660 rmdir #0x1/removal A.test00484 2021-07-19 20:18:15.032590 caller_uid=0, caller_gid=0{}) v2
2021-07-19 20:18:15.993 14e5d79eb700 20 -- v1:10.72.47.117:6817/4172579165 queue 0x55e4ea1f6300 prio 127

But it was dispatched 17 seconds later:

2021-07-19 20:18:32.203 14e5d6de5700 10 -- v1:10.72.47.117:6817/4172579165 dispatch_throttle_release 1952 to dispatch throttler 12178/104857600
2021-07-19 20:18:32.203 14e5d6de5700 20 -- v1:10.72.47.117:6817/4172579165 done calling dispatch on 0x55e4ea5d1860
2021-07-19 20:18:32.203 14e5d6de5700 1 -- v1:10.72.47.117:6817/4172579165 <== client.4275 v1:10.72.47.117:0/267216858 31878 ==== client_request(client.4275:26660 rmdir #0x1/removalA.test00484 2021-07-19 20:18:15.032590 caller_uid=0, caller_gid=0{}) v2 ==== 246+0+0 (unknown 3570435590 0) 0x55e4ea1f6300 con 0x55e4e97f7680

I found during this time there were full of "client_caps" request from the kclients:

2021-07-19 20:18:32.189 14e5d6de5700 1 -- v1:10.72.47.117:6817/4172579165 <== client.4275 v1:10.72.47.117:0/267216858 31874 ==== client_caps(update ino 0x10000001906 6409 seq 2 caps=pAsLsXsFsx dirty=- wanted=AsLsXsFsx follows 0 size 0/0 mtime 2021-07-19 20:16:51.628613 xattrs(v=1844 6653976102059296 l=0)) v10 ==== 236+0+0 (unknown 1365370666 0 0) 0x55e4ec1c6400 con 0x55e4e97f7680

Which were issued by kclients' delayed_work():

Jul 19 20:18:14 lxbceph1 kernel: ceph: mdsc delayed_work
Jul 19 20:18:14 lxbceph1 kernel: ceph: check_delayed_caps
Jul 19 20:18:14 lxbceph1 kernel: ceph: check_delayed_caps on 00000000b5d50ae1
Jul 19 20:18:14 lxbceph1 kernel: ceph: __cap_delay_cancel 00000000b5d50ae1
Jul 19 20:18:14 lxbceph1 kernel: ceph: __ceph_caps_issued 00000000b5d50ae1 cap 00000000dbfb7a2b issued pAsxLsXsxFsx
Jul 19 20:18:14 lxbceph1 kernel: ceph: check_caps 00000000b5d50ae1 file_want - used - dirty - flushing - issued pAsxLsXsxFsx revoking - retain pAsLsX>
Jul 19 20:18:14 lxbceph1 kernel: ceph: mds0 cap 00000000dbfb7a2b used - issued pAsxLsXsxFsx implemented pAsxLsXsxFsx revoking -
Jul 19 20:18:14 lxbceph1 kernel: ceph: __send_cap 00000000b5d50ae1 cap 00000000dbfb7a2b session 00000000403a5e35 pAsxLsXsxFsx -> pAsLsXsFsx (revoking>
Jul 19 20:18:14 lxbceph1 kernel: ceph: __cap_set_timeouts 00000000b5d50ae1 min 5704 max 60704
Jul 19 20:18:14 lxbceph1 kernel: ceph: send_cap_msg update 151c 1000000151a caps pAsLsXsFsx wanted AsLsXsFsx dirty - seq 4/4 tid 0/0 mseq 0 follows 0>
...

The delayed_work() will be run periodically per 5 seconds and it will run the ceph_check_delayed_caps(), and the "mdsc->cap_delay_list" accumulated to many caps need to check.
We need to improve this logic in kclient.

History

#2 Updated by Xiubo Li over 2 years ago

6740078 2021-07-20 11:05:41.660 149df03bb700 20 -- v1:10.72.47.117:6819/926278509 done calling dispatch on 0x558e1ef9a800
6740079 2021-07-20 11:05:41.660 149df0fc1700 20 --1- v1:10.72.47.117:6819/926278509 >> v1:10.72.47.117:0/4215792330 conn(0x558e1c19b680 0x558e1c0f1800 :6819 s=READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=0).handle_message_footer r=0
6740080 2021-07-20 11:05:41.660 149df0fc1700 10 --1- v1:10.72.47.117:6819/926278509 >> v1:10.72.47.117:0/4215792330 conn(0x558e1c19b680 0x558e1c0f1800 :6819 s=READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=0).handle_message_footer aborted = 0
6740081 2021-07-20 11:05:41.660 149df0fc1700 20 --1- v1:10.72.47.117:6819/926278509 >> v1:10.72.47.117:0/4215792330 conn(0x558e1c19b680 0x558e1c0f1800 :6819 s=READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=0).handle_message_footer got 236 + 0 + 0 byte message
6740082 2021-07-20 11:05:41.660 149df03bb700 0 -- v1:10.72.47.117:6819/926278509 lxb ------- queue_length: 3667
6740083 2021-07-20 11:05:41.660 149df03bb700 1 -- v1:10.72.47.117:6819/926278509 <== client.34268 v1:10.72.47.117:0/4215792330 15419 ==== client_caps(update ino 0x10000000756 1880 seq 4 caps=pAsLsXsFsx dirty=- wanted=AsLsXsFsx follows 0 size 0/0 mtime 2021-07-20 11:04:15.122296 xattrs(v=18446653976494865696 l=0)) v10 ==== 236+0+0 (unknown 4111794353 0 0) 0x558e1ef9a400 con 0x558e1c19b680

...

6741104 2021-07-20 11:05:41.664 149df0fc1700  5 --1- v1:10.72.47.117:6819/926278509 >> v1:10.72.47.117:0/4215792330 conn(0x558e1c19b680 0x558e1c0f1800 :6819 s=READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=0). rx client.34268 seq 19108 0x558e1c185800 client_request(client.34268:15332 rmdir #0x1/removalB.test00553 2021-07-20 11:05:40.806351 caller_uid=0, caller_gid=0{}) v2

...

7624207 2021-07-20 11:05:55.093 149df03bb700 20 -- v1:10.72.47.117:6819/926278509 done calling dispatch on 0x558e1d7e4c00
7624208 2021-07-20 11:05:55.093 149df03bb700 0 -- v1:10.72.47.117:6819/926278509 lxb ------- queue_length: 646
7624209 2021-07-20 11:05:55.093 149df0fc1700 20 --1- v1:10.72.47.117:6819/926278509 >> v1:10.72.47.117:0/4215792330 conn(0x558e1c19b680 0x558e1c0f1800 :6819 s=OPENED pgs=2 cs=1 l=0).write_message signed m=0x558e1d40e840): sig = 0
7624210 2021-07-20 11:05:55.093 149df03bb700 1 -- v1:10.72.47.117:6819/926278509 <== client.34268 v1:10.72.47.117:0/4215792330 19108 ==== client_request(client.34268:15332 rmdir #0x1/removalB.test00553 2021-07-20 11:05:40.806351 caller_uid=0, caller_gid=0{}) v2 ==== 246+0+0 (unknown 3335763886 0 0) 0x558e1c185800 con 0x558e1c19b680
7624211 2021-07-20 11:05:55.093 149df0fc1700 20 --1- v1:10.72.47.117:6819/926278509 >> v1:10.72.47.117:0/4215792330 conn(0x558e1c19b680 0x558e1c0f1800 :6819 s=OPENED pgs=2 cs=1 l=0).write_message sending message type=26 src mds.0 front=673 data=0 off 0

From the above logs we can see that when the "rmdir #0x1/removalB.test00553" came, the dispatch queue already has 3667 items queued.

#3 Updated by Jeff Layton over 2 years ago

Xiubo, does this improve at all with the recent livelock fix from Luis?

commit bf2ba432213fade50dd39f2e348085b758c0726e
Author: Luis Henriques <lhenriques@suse.de>
Date:   Tue Jul 6 14:52:41 2021 +0100

    ceph: reduce contention in ceph_check_delayed_caps()

That should prevent the work from dealing with caps that got added to the list after the work started. Does that alleviate the pressure enough for the call to get through?

If not, then what (specifically) are these calls waiting on? Some mutex or rwsem? If so, then maybe we can just allow the workqueue job to release the locks between each loop iteration to allow other calls to get through?

#4 Updated by Xiubo Li about 2 years ago

  • Status changed from In Progress to Won't Fix

Jeff Layton wrote:

Xiubo, does this improve at all with the recent livelock fix from Luis?

[...]

Sorry I missed this comment.

Not much for my case, I had tried to apply this commit that time.

That should prevent the work from dealing with caps that got added to the list after the work started. Does that alleviate the pressure enough for the call to get through?

Just a little better, but the problem still existed.

If not, then what (specifically) are these calls waiting on? Some mutex or rwsem? If so, then maybe we can just allow the workqueue job to release the locks between each loop iteration to allow other calls to get through?

The problem is that 5 seconds could accumulate many dirty caps if there have many files are under operations. Such as 5000 in the list and will be sent out once in 1 second ? Then the MDS will be overloaded of these requests by holding the mds_lock.

Since the MDS has switched to use fair mutex lock, this shouldn't be a big issue now.

Also available in: Atom PDF