Bug #22547
open
active mds session miss for client
Added by wei jin over 6 years ago.
Updated over 6 years ago.
Description
Our user case: k8s docker mounts cephfs using cephfs kernel client.
If we do not use the 'mounted dir', after a while, the session will miss in active mds, we check it by command:
ceph daemon mds.FOO session ls
However, on the docker host, we can find the mount info using df -hT, and can 'cd' to that dir and operate it. And then we check the session it goes back using above command. After a while, it will miss again.
There is a log from ceph.log, every time we operate the mounted dir, it will show following messages:
2017-12-28 12:18:31.625165 mds.0 10.11.67.226:6800/2738461306 340 : cluster [INF] denied reconnect attempt (mds is up:active) from client.2359121 10.11.70.66:0/3075221752 after 44973.678531 (allowed interval 100)
please set debug_mds=10 and check why mds evicted the client. it's likely that docker host went to sleep or there was connection problem between docker host and mds
Ok. I will do it soon.
This happened after I restarted mds daemon last night. And also there is another crash(bug 22548) after I rebooted the mds.
wei jin wrote:
Ok. I will do it soon.
I can not reproduce it after open the log and it will have an impact against online service due to long latency of mds daemon. May try it later.
zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.
I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?
by saying evicted, i means due to the auto_close_timeout.
- Status changed from New to Need More Info
dongdong tao wrote:
zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.
I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?
No. mds sends session close message to client when evicting client. Client knows its connection is evicted
Zheng Yan wrote:
dongdong tao wrote:
zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.
I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?
No. mds sends session close message to client when evicting client. Client knows its connection is evicted
sorry, zheng, i try to search the code, couldn't find the code.
i believe the code path is find_idle_sessions->kill_session->journal_close_session. after journal flushed, will call _session_logged. since session state is STATE_KILLING, mds will only mark the connection down and clean the connection.
please correct me if i miss understand some part.
Sorry. the while the process is:
mds close client connection
client's remote_reset callback gets called
client sends reconnect message to mds
mds denies the reconnect message and sends a session close message to client
Zheng Yan wrote:
Sorry. the while the process is:
mds close client connection
client's remote_reset callback gets called
client sends reconnect message to mds
mds denies the reconnect message and sends a session close message to client
Thanks zheng for you detail explanation. i found the code in kernel
by the way, the behavior of libcephfs based client is different here,
it's not trying to reconnect the mds, just call _closed_mds_session.
Also available in: Atom
PDF