Project

General

Profile

Bug #22547

active mds session miss for client

Added by wei jin about 6 years ago. Updated about 6 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Our user case: k8s docker mounts cephfs using cephfs kernel client.

If we do not use the 'mounted dir', after a while, the session will miss in active mds, we check it by command:
ceph daemon mds.FOO session ls

However, on the docker host, we can find the mount info using df -hT, and can 'cd' to that dir and operate it. And then we check the session it goes back using above command. After a while, it will miss again.

There is a log from ceph.log, every time we operate the mounted dir, it will show following messages:
2017-12-28 12:18:31.625165 mds.0 10.11.67.226:6800/2738461306 340 : cluster [INF] denied reconnect attempt (mds is up:active) from client.2359121 10.11.70.66:0/3075221752 after 44973.678531 (allowed interval 100)

History

#1 Updated by Zheng Yan about 6 years ago

please set debug_mds=10 and check why mds evicted the client. it's likely that docker host went to sleep or there was connection problem between docker host and mds

#2 Updated by wei jin about 6 years ago

Ok. I will do it soon.

This happened after I restarted mds daemon last night. And also there is another crash(bug 22548) after I rebooted the mds.

#3 Updated by wei jin about 6 years ago

wei jin wrote:

Ok. I will do it soon.

I can not reproduce it after open the log and it will have an impact against online service due to long latency of mds daemon. May try it later.

#4 Updated by dongdong tao about 6 years ago

zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.

I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?

#5 Updated by dongdong tao about 6 years ago

by saying evicted, i means due to the auto_close_timeout.

#6 Updated by Jos Collin about 6 years ago

  • Status changed from New to Need More Info

#7 Updated by Zheng Yan about 6 years ago

dongdong tao wrote:

zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.

I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?

No. mds sends session close message to client when evicting client. Client knows its connection is evicted

#8 Updated by dongdong tao about 6 years ago

Zheng Yan wrote:

dongdong tao wrote:

zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.

I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?

No. mds sends session close message to client when evicting client. Client knows its connection is evicted

sorry, zheng, i try to search the code, couldn't find the code.
i believe the code path is find_idle_sessions->kill_session->journal_close_session. after journal flushed, will call _session_logged. since session state is STATE_KILLING, mds will only mark the connection down and clean the connection.

please correct me if i miss understand some part.

#9 Updated by Zheng Yan about 6 years ago

Sorry. the while the process is:

mds close client connection
client's remote_reset callback gets called
client sends reconnect message to mds
mds denies the reconnect message and sends a session close message to client

#10 Updated by dongdong tao about 6 years ago

Zheng Yan wrote:

Sorry. the while the process is:

mds close client connection
client's remote_reset callback gets called
client sends reconnect message to mds
mds denies the reconnect message and sends a session close message to client

Thanks zheng for you detail explanation. i found the code in kernel
by the way, the behavior of libcephfs based client is different here,
it's not trying to reconnect the mds, just call _closed_mds_session.

Also available in: Atom PDF