Bug #51722
mds: slow performance on parallel rm operations for multiple kclients
0%
Description
There has another case that could cause the unlinkat to be delayed for a long time sometimes, such as for the "removalC.test01921" directory, it was delayed for almost 2 seconds:
removalC.log:unlinkat(AT_FDCWD, "removalC.test01921", AT_REMOVEDIR) = 0 <1.722029>
Sometimes the delay could up to 3 seconds or longer from my previous tests.
From the mds.a.log I can see that "lookup" request finished very fast as expected:
2021-07-16 16:46:36.326 1554aac09700 1 -- v1:10.72.47.117:6817/578524851 <== client.4281 v1:10.72.47.117:0/346661127 29425 ==== client_request(client.4281:23536 lookup #0x1/removalC.test01921 2021-07-16 16:46:36.316309 caller_uid=0, caller_gid=0{0,}) v4 ==== 162+0+0 (unknown 4262505302 0 0) 0x5628efca4600 con 0x5628ef411200
...
2021-07-16 16:46:36.326 1554aac09700 10 mds.0.server reply to stat on client_request(client.4281:23536 lookup #0x1/removalC.test01921 2021-07-16 16:46:36.316309 caller_uid=0, caller_gid=0{0,}) v4
2021-07-16 16:46:36.326 1554aac09700 7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.4281:23536 lookup #0x1/removalC.test01921 2021-07-16 16:46:36.316309 caller_uid=0, caller_gid=0{0,}) v4
And the "rmdir" request came in a very short time too:
2021-07-16 16:46:36.350 1554aac09700 1 -- v1:10.72.47.117:6817/578524851 <== client.4281 v1:10.72.47.117:0/346661127 29429 ==== client_request(client.4281:23539 rmdir #0x1/removalC.test01921 2021-07-16 16:46:36.343310 caller_uid=0, caller_gid=0{0,}) v4 ==== 268+0+0 (unknown 203429810 0 0) 0x5628f0fb1b00 con 0x5628ef411200
...
2021-07-16 16:46:36.382 1554aac09700 7 mds.0.server dispatch_client_request client_request(client.4281:23539 rmdir #0x1/removalC.test01921 2021-07-16 16:46:36.343310 caller_uid=0, caller_gid=0{0,}) v4
And then when trying to revoke the "Ls" cap from kclient and the "rmdir" request was queued:
2021-07-16 16:46:36.384 1554aac09700 7 mds.0.locker simple_lock on (ilink sync) on [inode 0x10000001693 [...2,head] /removalC.test01921/ auth v11560 ap=2 dirtyparent f() n(v0 1=0+1) (isnap xlock x=1 by 0x5628f1c5d600) (inest lock) (ifile sync r=1) (iversion lock w=1 last_client=4281) caps={4281=pAsLsXsFs/pAsLsXsFsx@2},l=4281 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 authpin=1 0x5628f23b7100]
2021-07-16 16:46:36.384 1554aac09700 7 mds.0.locker issue_caps loner client.4281 allowed=pAsXsFscrl, xlocker allowed=pAsXsFscrl, others allowed=pAsXsFscrl on [inode 0x10000001693 [...2,head] /removalC.test01921/ auth v11560 ap=2 dirtyparent f() n(v0 1=0+1) (ilink sync->lock) (isnap xlock x=1 by 0x5628f1c5d600) (inest lock) (ifile sync r=1) (iversion lock w=1 last_client=4281) caps={4281=pAsLsXsFs/pAsLsXsFsx@2},l=4281 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 authpin=1 0x5628f23b7100]
2021-07-16 16:46:36.384 1554aac09700 20 mds.0.locker client.4281 pending pAsLsXsFs allowed pAsXsFscrl wanted pAsLsXsFsx
2021-07-16 16:46:36.384 1554aac09700 7 mds.0.locker sending MClientCaps to client.4281 seq 3 new pending pAsXsFs was pAsLsXsFs
2021-07-16 16:46:36.384 1554aac09700 20 mds.0.cache.ino(0x10000001693) encode_cap_message pfile 0 pauth 0 plink 0 pxattr 0 ctime 2021-07-16 16:42:52.406815
2021-07-16 16:46:36.384 1554aac09700 10 mds.0.5 send_message_client_counted client.4281 seq 17280 client_caps(revoke ino 0x10000001693 15513 seq 3 caps=pAsXsFs dirty=- wanted=pAsLsXsFsx follows 0 size 0/0 ts 1/18446744073709551615 mtime 2021-07-16 16:42:52.406815) v11
2021-07-16 16:46:36.384 1554aac09700 1 -- v1:10.72.47.117:6817/578524851 --> v1:10.72.47.117:0/346661127 -- client_caps(revoke ino 0x10000001693 15513 seq 3 caps=pAsXsFs dirty=- wanted=pAsLsXsFsx follows 0 size 0/0 ts 1/18446744073709551615 mtime 2021-07-16 16:42:52.406815) v11 -- 0x5628f1828000 con 0x5628ef411200
2021-07-16 16:46:36.384 1554aac09700 10 mds.0.cache.ino(0x10000001693) auth_pin by 0x5628f23b7688 on [inode 0x10000001693 [...2,head] /removalC.test01921/ auth v11560 ap=3 dirtyparent f() n(v0 1=0+1) (ilink sync->lock) (isnap xlock x=1 by 0x5628f1c5d600) (inest lock) (ifile sync r=1) (iversion lock w=1 last_client=4281) caps={4281=pAsXsFs/pAsLsXsFs/pAsLsXsFsx@3},l=4281 | request=1 lock=3 dirfrag=1 caps=1 dirtyparent=1 dirty=1 authpin=1 0x5628f23b7100] now 3
2021-07-16 16:46:36.384 1554aac09700 15 mds.0.cache.dir(0x1) adjust_nested_auth_pins 1 on [dir 0x1 / [2,head] auth pv=23519 v=23271 cv=1/1 REP dir_auth=0 ap=124+625 state=1610874881|complete f(v0 m2021-07-16 16:46:30.954106 365=0+365) n(v0 rc2021-07-16 16:46:30.954106 365=0+365)/n() hs=365+5635,ss=0+0 dirty=6000 | child=1 subtree=1 dirty=1 authpin=1 0x5628ef33ef00] by 0x5628f23b7100 count now 124/625
2021-07-16 16:46:36.384 1554aac09700 10 mds.0.cache.ino(0x10000001693) add_waiter tag 2000000000600000 0x5628ef6159a0 !ambig 1 !frozen 1 !freezing 1
2021-07-16 16:46:36.384 1554aac09700 15 mds.0.cache.ino(0x10000001693) taking waiter here
And the "Ls" cap revoke request also replied from kclient in a very short time:
2021-07-16 16:46:36.385 1554aac09700 1 -- v1:10.72.47.117:6817/578524851 <== client.4281 v1:10.72.47.117:0/346661127 29430 ==== client_caps(update ino 0x10000001693 15513 seq 3 caps=pAsXsFs dirty=- wanted=pAsLsXsFsx follows 0 size 0/0 mtime 2021-07-16 16:42:52.406815 xattrs(v=18446744072645829216 l=0)) v10 ==== 236+0+0 (unknown 3915617700 0 0) 0x5628f1828000 con 0x5628ef411200
But the "handle_client_caps()" was delayed to trigger until 1.7 seconds later:
2021-07-16 16:46:38.042 1554aac09700 7 mds.0.locker handle_client_caps on 0x10000001693 tid 0 follows 0 op update flags 0x0
I just add some debug logs and can see that it was stuck in "MDSDaemon::ms_dispatch2()" and was waiting for the "mds_lock":
The code:
bool MDSDaemon::ms_dispatch2(const Message::ref &m)
{
+ dout(1) << func << " lxb ---- before mds_lock --> " << m << dendl;
std::lock_guard l(mds_lock);
+ dout(1) << func << " lxb ---- after mds_lock --> " << m << dendl;
if (stopping) {
return false;
}
The debug logs:
2021-07-16 16:46:36.385 1554aac09700 1 mds.a ms_dispatch2 lxb ---- before mds_lock --> 0x5628f1828000
...
2021-07-16 16:46:38.042 1554aac09700 1 mds.a ms_dispatch2 lxb ---- after mds_lock --> 0x5628f1828000
Checked the logs, during this time the "mds_lock" was always successfully and continuously acquired by tens or hundreds of "18C_PurgeStrayLogged" finisher call backs, which were called when mdlogs were flushed and replied, while the "MDSDaemon::ms_dispatch2()" thread was hungry and never get a change to be woke up by the "mds_lock".
The implementations of the Mutex (e.g. std::mutex in C++) do not guarantee fairness, they do not guarantee that the lock will be acquired by threads in the order that they called lock().
Related issues
History
#1 Updated by Xiubo Li over 2 years ago
- Status changed from New to Fix Under Review
- Priority changed from Normal to High
- Pull request ID set to 42394
This is from bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1974882.
#2 Updated by Xiubo Li over 2 years ago
- Pull request ID changed from 42394 to 42620
#3 Updated by Xiubo Li over 2 years ago
The new PR will switch mds_lock to fair mutex.
#4 Updated by Patrick Donnelly over 2 years ago
- Status changed from Fix Under Review to Pending Backport
- Source set to Development
- Backport set to pacific
#5 Updated by Backport Bot over 2 years ago
- Copied to Backport #52441: pacific: mds: slow performance on parallel rm operations for multiple kclients added
#6 Updated by Loïc Dachary over 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".