Project

General

Profile

Bug #20129

Client syncfs is slow (waits for next MDS tick)

Added by dongdong tao 4 months ago. Updated 3 months ago.

Status:
New
Priority:
High
Assignee:
Category:
Performance/Resource Usage
Target version:
-
Start date:
06/01/2017
Due date:
06/30/2017
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Component(FS):
Needs Doc:
No

Description

in function client::unmount there are codes below:
-------------
while (!mds_requests.empty()) {
ldout(cct, 10) << "waiting on " << mds_requests.size() << " requests" << dendl;
mount_cond.Wait(client_lock);
}
------------
here in my situation, there is one "rename" request left, so unmount is waiting for that rename request to be finished, and it takes almost 5 seconds. let me describe what are trying to do here first:
we are testing the performance about cephfs with hadoop. the hadoop job would call a lot ceph mount and unmount, so it takes too much time on unmount
cause almost each unmount would take 5 seconds to wait for the last write operation (rename, create, mkdir ...)

and i have do a lot analysis here:
1 mds would reply 2 times for write operation:
-> first time is the early reply which is unsafe reply for the client, this is the time when the memory is ready and prepare to submit the journal event, and when the client receive the unsafe reply, client would go ahead and start next request, and don't need to wait untill flushing the journal
-> second reply is the safe reply for the client, this is the time when the journal has been flushed, and my issue is that client::unmount wait for this reply for almost 5 seconds.

so, through some debugging, i have found that mds would call mdlog->flush() to flush the journal.
and i also found that if a write operation is the session's last operation, mds would only flush the journal of that write operation when reach a
mds tick interval which is 5 seconds. so that explains why my unmount always take seconds to finish. and i have try to set the mds tick interval to
2 seconds, and it worked. and we can not just set that mds tick interval to 2 to solve our issue

so my question is, how can we solve the issue completely by modify the source code ? would that be safe if we add mdlog->flush() at the end of handle_client_reply handle_client_mkdir and so on. could you please give the suggestion here first, i believe this is a bug.

Regards,
Dongdong

History

#1 Updated by John Spray 4 months ago

  • Subject changed from client wait about 5 second to close the session to Client syncfs is slow (waits for next MDS tick)

This is the same behaviour you would see if you were running "sync" on a filesystem.

We handle this for fsync (on an inode) but not for sync (on a whole filsystem):

commit 826b30aa28478650755ffe613a7bfc1074778599
Author: Jeff Layton <jlayton@redhat.com>
Date:   Wed Nov 9 09:36:07 2016 -0500

    client: wire up the CHECK_CAPS_SYNCHRONOUS flag

    Ensure that the client will request an immediate journal flush from the
    MDS when we'll end up waiting on the flush response. This patch should
    fix the fsync codepath, but we may need something similar for syncfs.

    Signed-off-by: Jeff Layton <jlayton@redhat.com>

So I guess we should fix this.

#2 Updated by Zheng Yan 4 months ago

please try below patch. If it still doesn't work for you(there is no dirty caps), you need to implement mechanism that explicitly asks mds to flush mdlog (modify both client and mds)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 0a55e73977..aae1a49bd3 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -5796,6 +5796,7 @@ void Client::unmount()

   ldout(cct, 2) << "unmounting" << dendl;
   unmounting = true;
+  flush_caps_sync();

   while (!mds_requests.empty()) {
     ldout(cct, 10) << "waiting on " << mds_requests.size() << " requests" << dendl;

#3 Updated by dongdong tao 4 months ago

Zheng Yan wrote:

please try below patch. If it still doesn't work for you(there is no dirty caps), you need to implement mechanism that explicitly asks mds to flush mdlog (modify both client and mds)

[...]

Hi Yan,

I have applied your patch and looks like it worked. would you please explain to me why your solution would work ? cause 
i thought "flush_caps_sync" would only flush the caps, why it would also flush the journal?
would this be a final solution for this issue ?

Regards,
Dongdong

#4 Updated by Zheng Yan 4 months ago

the last cap message has CHECK_CAPS_SYNCHRONOUS flag. mds flushes mdlog when it sees the flag. The patch isn't perfect. It does not work for 'no dirty caps' case.

#5 Updated by dongdong tao 4 months ago

Zheng Yan wrote:

the last cap message has CHECK_CAPS_SYNCHRONOUS flag. mds flushes mdlog when it sees the flag. The patch isn't perfect. It does not work for 'no dirty caps' case.

Thanks yan for your quick reply, i think the perfect solution is to implement a mechanism that explicitly asks mds to flush mdlog
as you told me before. below is my code to implement that mechanism in client side, please take a look and kindly give your advise:

void Client::flush_mdlog_sync()
{
+
+ for (map<mds_rank_t,MetaSession*>::iterator p = mds_sessions.begin();
+ p != mds_sessions.end();
+ +p) {
MetaSession *s = p->second;
+ flush_mdlog(s);
}

void Client::flush_mdlog(MetaSession *session)
{
+ MClientSession *m = new MClientSession(CEPH_SESSION_REQUEST_FLUSH_MDLOG);
+ session->con->send_message(m);
}

void Client::unmount() {
Mutex::Locker lock(client_lock);
@ -5702,9 +5718,10 @ void Client::unmount()

ldout(cct, 2) << "unmounting" << dendl;
unmounting = true;
-
+
while (!mds_requests.empty()) {
ldout(cct, 10) << "waiting on " << mds_requests.size() << " requests" << dendl;
+ flush_mdlog_sync();
mount_cond.Wait(client_lock);
}

as for the mds side:

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 1dc5ca2..dea22a3 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@ -390,6 +390,10 @ void Server::handle_client_session(MClientSession *m)
finish_flush_session(session, m->get_seq());
break;

+ case CEPH_SESSION_REQUEST_FLUSH_MDLOG:
+ mdlog->flush();
+ break;
+
default:
ceph_abort();
}

Regards
Dongdong

#6 Updated by Zheng Yan 4 months ago

dongdong tao wrote:

Zheng Yan wrote:

the last cap message has CHECK_CAPS_SYNCHRONOUS flag. mds flushes mdlog when it sees the flag. The patch isn't perfect. It does not work for 'no dirty caps' case.

Thanks yan for your quick reply, i think the perfect solution is to implement a mechanism that explicitly asks mds to flush mdlog
as you told me before. below is my code to implement that mechanism in client side, please take a look and kindly give your advise:

void Client::flush_mdlog_sync()
{
+
+ for (map<mds_rank_t,MetaSession*>::iterator p = mds_sessions.begin();
+ p != mds_sessions.end();
+ +p) {
MetaSession *s = p->second;
+ flush_mdlog(s);
}

void Client::flush_mdlog(MetaSession *session)
{
+ MClientSession *m = new MClientSession(CEPH_SESSION_REQUEST_FLUSH_MDLOG);
+ session->con->send_message(m);
}

void Client::unmount() {
Mutex::Locker lock(client_lock);
@ -5702,9 +5718,10 @ void Client::unmount()

ldout(cct, 2) << "unmounting" << dendl;
unmounting = true;
-
+
while (!mds_requests.empty()) {
ldout(cct, 10) << "waiting on " << mds_requests.size() << " requests" << dendl;
+ flush_mdlog_sync();
mount_cond.Wait(client_lock);
}

as for the mds side:

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 1dc5ca2..dea22a3 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@ -390,6 +390,10 @ void Server::handle_client_session(MClientSession *m)
finish_flush_session(session, m->get_seq());
break;

+ case CEPH_SESSION_REQUEST_FLUSH_MDLOG:
+ mdlog->flush();
+ break;
+
default:
ceph_abort();
}

Regards
Dongdong

It should work.

#7 Updated by John Spray 4 months ago

dongdong tao -- could you please open a pull request with your code change once it is working for you?

#8 Updated by dongdong tao 4 months ago

John Spray wrote:

dongdong tao -- could you please open a pull request with your code change once it is working for you?

ok, John, i will pull a request ASAP.

#10 Updated by dongdong tao 3 months ago

John Spray wrote:

dongdong tao -- could you please open a pull request with your code change once it is working for you?

John, is there any thing else i need to do here for this PR?

Also available in: Atom PDF