Bug #20129: Client syncfs is slow (waits for next MDS tick) - CephFS - Ceph

Actions

Copy link

Bug #20129

closed

Client syncfs is slow (waits for next MDS tick)

Added by dongdong tao almost 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

High

Assignee:

Zheng Yan

Category:

Performance/Resource Usage

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

in function client::unmount there are codes below:
-------------
while (!mds_requests.empty()) {
ldout(cct, 10) << "waiting on " << mds_requests.size() << " requests" << dendl;
mount_cond.Wait(client_lock);
}
------------
here in my situation, there is one "rename" request left, so unmount is waiting for that rename request to be finished, and it takes almost 5 seconds. let me describe what are trying to do here first:
we are testing the performance about cephfs with hadoop. the hadoop job would call a lot ceph mount and unmount, so it takes too much time on unmount
cause almost each unmount would take 5 seconds to wait for the last write operation (rename, create, mkdir ...)

and i have do a lot analysis here:
1 mds would reply 2 times for write operation:
-> first time is the early reply which is unsafe reply for the client, this is the time when the memory is ready and prepare to submit the journal event, and when the client receive the unsafe reply, client would go ahead and start next request, and don't need to wait untill flushing the journal
-> second reply is the safe reply for the client, this is the time when the journal has been flushed, and my issue is that client::unmount wait for this reply for almost 5 seconds.

so, through some debugging, i have found that mds would call mdlog->flush() to flush the journal.
and i also found that if a write operation is the session's last operation, mds would only flush the journal of that write operation when reach a
mds tick interval which is 5 seconds. so that explains why my unmount always take seconds to finish. and i have try to set the mds tick interval to
2 seconds, and it worked. and we can not just set that mds tick interval to 2 to solve our issue

so my question is, how can we solve the issue completely by modify the source code ? would that be safe if we add mdlog->flush() at the end of handle_client_reply handle_client_mkdir and so on. could you please give the suggestion here first, i believe this is a bug.

Regards,
Dongdong

Actions

Copy link

Updated by John Spray almost 7 years ago

Subject changed from client wait about 5 second to close the session to Client syncfs is slow (waits for next MDS tick)

This is the same behaviour you would see if you were running "sync" on a filesystem.

We handle this for fsync (on an inode) but not for sync (on a whole filsystem):

commit 826b30aa28478650755ffe613a7bfc1074778599
Author: Jeff Layton <jlayton@redhat.com>
Date:   Wed Nov 9 09:36:07 2016 -0500

    client: wire up the CHECK_CAPS_SYNCHRONOUS flag

    Ensure that the client will request an immediate journal flush from the
    MDS when we'll end up waiting on the flush response. This patch should
    fix the fsync codepath, but we may need something similar for syncfs.

    Signed-off-by: Jeff Layton <jlayton@redhat.com>

So I guess we should fix this.

Actions

Copy link

Updated by Zheng Yan almost 7 years ago

please try below patch. If it still doesn't work for you(there is no dirty caps), you need to implement mechanism that explicitly asks mds to flush mdlog (modify both client and mds)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 0a55e73977..aae1a49bd3 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -5796,6 +5796,7 @@ void Client::unmount()

   ldout(cct, 2) << "unmounting" << dendl;
   unmounting = true;
+  flush_caps_sync();

   while (!mds_requests.empty()) {
     ldout(cct, 10) << "waiting on " << mds_requests.size() << " requests" << dendl;

Actions

Copy link

Updated by dongdong tao almost 7 years ago

Zheng Yan wrote:

please try below patch. If it still doesn't work for you(there is no dirty caps), you need to implement mechanism that explicitly asks mds to flush mdlog (modify both client and mds)

[...]

Hi Yan,

I have applied your patch and looks like it worked. would you please explain to me why your solution would work ? cause 
i thought "flush_caps_sync" would only flush the caps, why it would also flush the journal? 
   would this be a final solution for this issue ?

Regards,
Dongdong

Actions

Copy link

Updated by Zheng Yan almost 7 years ago

the last cap message has CHECK_CAPS_SYNCHRONOUS flag. mds flushes mdlog when it sees the flag. The patch isn't perfect. It does not work for 'no dirty caps' case.

Actions

Copy link

Updated by dongdong tao almost 7 years ago

Zheng Yan wrote:

the last cap message has CHECK_CAPS_SYNCHRONOUS flag. mds flushes mdlog when it sees the flag. The patch isn't perfect. It does not work for 'no dirty caps' case.

Thanks yan for your quick reply, i think the perfect solution is to implement a mechanism that explicitly asks mds to flush mdlog
as you told me before. below is my code to implement that mechanism in client side, please take a look and kindly give your advise:

void Client::flush_mdlog_sync()
{
+
+ for (map<mds_rank_t,MetaSession*>::iterator p = mds_sessions.begin();
+ p != mds_sessions.end();
+ +p) {
MetaSession *s = p->second;
+ flush_mdlog(s);
}

void Client::flush_mdlog(MetaSession *session)
{
+ MClientSession *m = new MClientSession(CEPH_SESSION_REQUEST_FLUSH_MDLOG);
+ session->con->send_message(m);
}

void Client::unmount() {
Mutex::Locker lock(client_lock);
@ -5702,9 +5718,10 @ void Client::unmount()

ldout(cct, 2) << "unmounting" << dendl;
   unmounting = true;
-
+ 
   while (!mds_requests.empty()) {
     ldout(cct, 10) << "waiting on " << mds_requests.size() << " requests" << dendl;
+    flush_mdlog_sync(); 
     mount_cond.Wait(client_lock);
   }

as for the mds side:

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 1dc5ca2..dea22a3 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@ -390,6 +390,10 @ void Server::handle_client_session(MClientSession *m)
finish_flush_session(session, m->get_seq());
break;

+ case CEPH_SESSION_REQUEST_FLUSH_MDLOG:
+ mdlog->flush();
+ break;
+
default:
ceph_abort();
}

Regards
Dongdong

Actions

Copy link

Updated by Zheng Yan almost 7 years ago

dongdong tao wrote:

Zheng Yan wrote:

the last cap message has CHECK_CAPS_SYNCHRONOUS flag. mds flushes mdlog when it sees the flag. The patch isn't perfect. It does not work for 'no dirty caps' case.

Thanks yan for your quick reply, i think the perfect solution is to implement a mechanism that explicitly asks mds to flush mdlog
as you told me before. below is my code to implement that mechanism in client side, please take a look and kindly give your advise:

void Client::flush_mdlog_sync()
{
+
+ for (map<mds_rank_t,MetaSession*>::iterator p = mds_sessions.begin();
+ p != mds_sessions.end();
+ +p) {
MetaSession *s = p->second;
+ flush_mdlog(s);
}

void Client::flush_mdlog(MetaSession *session)
{
+ MClientSession *m = new MClientSession(CEPH_SESSION_REQUEST_FLUSH_MDLOG);
+ session->con->send_message(m);
}

void Client::unmount() {
Mutex::Locker lock(client_lock);
@ -5702,9 +5718,10 @ void Client::unmount()

ldout(cct, 2) << "unmounting" << dendl;
unmounting = true;
-
+
while (!mds_requests.empty()) {
ldout(cct, 10) << "waiting on " << mds_requests.size() << " requests" << dendl;
+ flush_mdlog_sync();
mount_cond.Wait(client_lock);
}

as for the mds side:

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 1dc5ca2..dea22a3 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@ -390,6 +390,10 @ void Server::handle_client_session(MClientSession *m)
finish_flush_session(session, m->get_seq());
break;

+ case CEPH_SESSION_REQUEST_FLUSH_MDLOG:
+ mdlog->flush();
+ break;
+
default:
ceph_abort();
}

Regards
Dongdong