Bug #44922
openWhen a large number of small files are written concurrently, the MDS getattr delay blocks on MDLog::flush interval
0%
Description
In the multimds cluster, run the vdbench test tool to read and write small files (3000 small files in a single directory), and the flow drops 0 in the process of creating and writing file。
Check perf top statistics,we found that the mds getattr delay is about 5 seconds.Time is mainly spent on acquire rdlock:
1. mds.0 receives the getattr request from the client. Under the current lock state, it does not have Rd permission. mds.0 sends the LOCK_AC_REQRDLOCK request to the the authority node mds.2 of this inode and waits.
2. The mds.2 holds wrlocks, it need to wait for MDLog::flush() to wrlock_finish on this lock. In the current scenario, there is no other process to trigger MDLog::flush(), which needs to wait until mdsrankdispatcher:: tick (5 second interval timer).
Setting mds_early_reply to false Can alleviate this problem, But it has an impact on Performance. I don't know if there's a way, if the mds.2 holds the wrlock of the lock,it can sense that there is an mds.0 is waiting for this lock,and then call mdlog: flush() to wrlock_finish on this lock.Or a better solution? Thandks!
log:
mds.0:
2020-03-17 18:16:49.004496 7fa643eef700 10 mds.0.cache.ino(0x10001788340) add_waiter tag 2000000040000000 0x55e335be0900 !ambig 1 !frozen 1 !freezing 1
2020-03-17 18:16:49.004499 7fa643eef700 15 mds.0.cache.ino(0x10001788340) taking waiter here
2020-03-17 18:16:53.996058 7fa643eef700 7 mds.0.locker handle_file_lock
2020-03-17 18:16:53.996825 7fa643eef700 10 mds.0.server reply to stat on client_request(client.30815695:29015719 getattr pAsLsXsFs #0x10001788340 2020-03-17 18:16:49.003290 caller_uid=0, caller_gid=0{0,}) v4
2020-03-17 18:16:53.996847 7fa643eef700 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0
2020-03-17 18:16:53.996853 7fa643eef700 20 mds.0.server lat 4.993524
mds.2:
2020-03-17 18:16:49.004532 7f42754ab700 7 mds.2.locker handle_file_lock a=reqrdlock on (ifile lock w=1) from mds.0 [inode 0x10001788340 [...2,head] /nfs2/vdb.1_87.dir/vdb.2_99.dir/ auth{0=1,1=1} v1329780 pv1329784 ap=3+1 f(v1 m2020-03-17 18:16:43.998529 2892=2892+0) n(v1 rc2020-03-17 18:16:48.998173 b189530112 2893=2892+1)/n(v1 rc2020-03-17 18:16:43.998529 b189464576 2893=2892+1) (isnap sync r=1) (inest mix w=1) (ipolicy sync r=1) (ifile lock w=1) (iversion lock w=1 last_client=30815695) caps={30815449=pAsLsXs/-@1,30815695=pAsLsXs/-@1859,30815785=pAsLsXs/-@1,30824662=pAsLsXs/-@1} | dirtyscattered=0 request=0 lock=5 importingcaps=0 importing=0 dirfrag=1 caps=1 stickydirs=0 dirtyrstat=0 replicated=1 dirty=1 waiter=0 authpin=1 0x558f21ecbc00]
2020-03-17 18:16:49.004560 7f42754ab700 7 mds.2.locker handle_reqrdlock got rdlock request on (ifile lock w=1) on [inode 0x10001788340 [...2,head] /nfs2/vdb.1_87.dir/vdb.2_99.dir/ auth{0=1,1=1} v1329780 pv1329784 ap=3+1 f(v1 m2020-03-17 18:16:43.998529 2892=2892+0) n(v1 rc2020-03-17 18:16:48.998173 b189530112 2893=2892+1)/n(v1 rc2020-03-17 18:16:43.998529 b189464576 2893=2892+1) (isnap sync r=1) (inest mix w=1) (ipolicy sync r=1) (ifile lock w=1) (iversion lock w=1 last_client=30815695) caps={30815449=pAsLsXs/-@1,30815695=pAsLsXs/-@1859,30815785=pAsLsXs/-@1,30824662=pAsLsXs/-@1} | dirtyscattered=0 request=0 lock=5 importingcaps=0 importing=0 dirfrag=1 caps=1 stickydirs=0 dirtyrstat=0 replicated=1 dirty=1 waiter=0 authpin=1 0x558f21ecbc00]
2020-03-17 18:16:53.995904 7f426dc9c700 7 mds.2.locker eval_gather finished gather on (ifile lock->sync) on [inode 0x10001788340 [...2,head] /nfs2/vdb.1_87.dir/vdb.2_99.dir/ auth{0=1,1=1} v1329784 ap=3+1 f(v1 m2020-03-17 18:16:49.000388 2893=2893+0) n(v1 rc2020-03-17 18:16:49.000388 b189530112 2894=2893+1) (isnap sync r=1) (inest mix w=1) (ipolicy sync r=1) (ifile lock->sync) (iversion lock) caps={30815449=pAsLsXs/-@1,30815695=pAsLsXs/-@1859,30815785=pAsLsXs/-@1,30824662=pAsLsXs/-@1} | dirtyscattered=0 request=0 lock=3 importingcaps=0 importing=0 dirfrag=1 caps=1 stickydirs=0 dirtyrstat=0 replicated=1 dirty=1 waiter=0 authpin=1 0x558f21ecbc00]
Updated by Greg Farnum about 4 years ago
- Project changed from Ceph to CephFS
- Subject changed from When a large number of small files are written concurrently, the MDS getattr delay is about 5 seconds to When a large number of small files are written concurrently, the MDS getattr delay blocks on MDLog::flush interval
- Category set to Performance/Resource Usage
- Component(FS) MDS added
- Labels (FS) multimds added
Updated by Jeff Layton about 4 years ago
There are some patches in the current kcephfs client testing branch that may help here:
https://github.com/ceph/ceph-client/tree/testing
92575a82886b ceph: request expedited service on session's last cap flush 343b28c38907 ceph: convert mdsc->cap_dirty to a per-session list
You may want to give those patches a try and see if they help your use-case.