Project

General

Profile

Actions

Bug #44922

open

When a large number of small files are written concurrently, the MDS getattr delay blocks on MDLog::flush interval

Added by wei qiaomiao about 4 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In the multimds cluster, run the vdbench test tool to read and write small files (3000 small files in a single directory), and the flow drops 0 in the process of creating and writing file。

Check perf top statistics,we found that the mds getattr delay is about 5 seconds.Time is mainly spent on acquire rdlock:

1. mds.0 receives the getattr request from the client. Under the current lock state, it does not have Rd permission. mds.0 sends the LOCK_AC_REQRDLOCK request to the the authority node mds.2 of this inode and waits.

2. The mds.2 holds wrlocks, it need to wait for MDLog::flush() to wrlock_finish on this lock. In the current scenario, there is no other process to trigger MDLog::flush(), which needs to wait until mdsrankdispatcher:: tick (5 second interval timer).

Setting mds_early_reply to false Can alleviate this problem, But it has an impact on Performance. I don't know if there's a way, if the mds.2 holds the wrlock of the lock,it can sense that there is an mds.0 is waiting for this lock,and then call mdlog: flush() to wrlock_finish on this lock.Or a better solution? Thandks!

log:
mds.0:
2020-03-17 18:16:49.004496 7fa643eef700 10 mds.0.cache.ino(0x10001788340) add_waiter tag 2000000040000000 0x55e335be0900 !ambig 1 !frozen 1 !freezing 1
2020-03-17 18:16:49.004499 7fa643eef700 15 mds.0.cache.ino(0x10001788340) taking waiter here

2020-03-17 18:16:53.996058 7fa643eef700 7 mds.0.locker handle_file_lock

2020-03-17 18:16:53.996825 7fa643eef700 10 mds.0.server reply to stat on client_request(client.30815695:29015719 getattr pAsLsXsFs #0x10001788340 2020-03-17 18:16:49.003290 caller_uid=0, caller_gid=0{0,}) v4

2020-03-17 18:16:53.996847 7fa643eef700 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0
2020-03-17 18:16:53.996853 7fa643eef700 20 mds.0.server lat 4.993524

mds.2:
2020-03-17 18:16:49.004532 7f42754ab700 7 mds.2.locker handle_file_lock a=reqrdlock on (ifile lock w=1) from mds.0 [inode 0x10001788340 [...2,head] /nfs2/vdb.1_87.dir/vdb.2_99.dir/ auth{0=1,1=1} v1329780 pv1329784 ap=3+1 f(v1 m2020-03-17 18:16:43.998529 2892=2892+0) n(v1 rc2020-03-17 18:16:48.998173 b189530112 2893=2892+1)/n(v1 rc2020-03-17 18:16:43.998529 b189464576 2893=2892+1) (isnap sync r=1) (inest mix w=1) (ipolicy sync r=1) (ifile lock w=1) (iversion lock w=1 last_client=30815695) caps={30815449=pAsLsXs/-@1,30815695=pAsLsXs/-@1859,30815785=pAsLsXs/-@1,30824662=pAsLsXs/-@1} | dirtyscattered=0 request=0 lock=5 importingcaps=0 importing=0 dirfrag=1 caps=1 stickydirs=0 dirtyrstat=0 replicated=1 dirty=1 waiter=0 authpin=1 0x558f21ecbc00]

2020-03-17 18:16:49.004560 7f42754ab700 7 mds.2.locker handle_reqrdlock got rdlock request on (ifile lock w=1) on [inode 0x10001788340 [...2,head] /nfs2/vdb.1_87.dir/vdb.2_99.dir/ auth{0=1,1=1} v1329780 pv1329784 ap=3+1 f(v1 m2020-03-17 18:16:43.998529 2892=2892+0) n(v1 rc2020-03-17 18:16:48.998173 b189530112 2893=2892+1)/n(v1 rc2020-03-17 18:16:43.998529 b189464576 2893=2892+1) (isnap sync r=1) (inest mix w=1) (ipolicy sync r=1) (ifile lock w=1) (iversion lock w=1 last_client=30815695) caps={30815449=pAsLsXs/-@1,30815695=pAsLsXs/-@1859,30815785=pAsLsXs/-@1,30824662=pAsLsXs/-@1} | dirtyscattered=0 request=0 lock=5 importingcaps=0 importing=0 dirfrag=1 caps=1 stickydirs=0 dirtyrstat=0 replicated=1 dirty=1 waiter=0 authpin=1 0x558f21ecbc00]

2020-03-17 18:16:53.995904 7f426dc9c700 7 mds.2.locker eval_gather finished gather on (ifile lock->sync) on [inode 0x10001788340 [...2,head] /nfs2/vdb.1_87.dir/vdb.2_99.dir/ auth{0=1,1=1} v1329784 ap=3+1 f(v1 m2020-03-17 18:16:49.000388 2893=2893+0) n(v1 rc2020-03-17 18:16:49.000388 b189530112 2894=2893+1) (isnap sync r=1) (inest mix w=1) (ipolicy sync r=1) (ifile lock->sync) (iversion lock) caps={30815449=pAsLsXs/-@1,30815695=pAsLsXs/-@1859,30815785=pAsLsXs/-@1,30824662=pAsLsXs/-@1} | dirtyscattered=0 request=0 lock=3 importingcaps=0 importing=0 dirfrag=1 caps=1 stickydirs=0 dirtyrstat=0 replicated=1 dirty=1 waiter=0 authpin=1 0x558f21ecbc00]

Actions #1

Updated by Greg Farnum about 4 years ago

  • Project changed from Ceph to CephFS
  • Subject changed from When a large number of small files are written concurrently, the MDS getattr delay is about 5 seconds to When a large number of small files are written concurrently, the MDS getattr delay blocks on MDLog::flush interval
  • Category set to Performance/Resource Usage
  • Component(FS) MDS added
  • Labels (FS) multimds added
Actions #2

Updated by Jeff Layton about 4 years ago

There are some patches in the current kcephfs client testing branch that may help here:

https://github.com/ceph/ceph-client/tree/testing
92575a82886b ceph: request expedited service on session's last cap flush
343b28c38907 ceph: convert mdsc->cap_dirty to a per-session list

You may want to give those patches a try and see if they help your use-case.

Actions

Also available in: Atom PDF