Bug #22547
active mds session miss for client
0%
Description
Our user case: k8s docker mounts cephfs using cephfs kernel client.
If we do not use the 'mounted dir', after a while, the session will miss in active mds, we check it by command:
ceph daemon mds.FOO session ls
However, on the docker host, we can find the mount info using df -hT, and can 'cd' to that dir and operate it. And then we check the session it goes back using above command. After a while, it will miss again.
There is a log from ceph.log, every time we operate the mounted dir, it will show following messages:
2017-12-28 12:18:31.625165 mds.0 10.11.67.226:6800/2738461306 340 : cluster [INF] denied reconnect attempt (mds is up:active) from client.2359121 10.11.70.66:0/3075221752 after 44973.678531 (allowed interval 100)
History
#1 Updated by Zheng Yan about 6 years ago
please set debug_mds=10 and check why mds evicted the client. it's likely that docker host went to sleep or there was connection problem between docker host and mds
#2 Updated by wei jin about 6 years ago
Ok. I will do it soon.
This happened after I restarted mds daemon last night. And also there is another crash(bug 22548) after I rebooted the mds.
#3 Updated by wei jin about 6 years ago
wei jin wrote:
Ok. I will do it soon.
I can not reproduce it after open the log and it will have an impact against online service due to long latency of mds daemon. May try it later.
#4 Updated by dongdong tao about 6 years ago
zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.
I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?
#5 Updated by dongdong tao about 6 years ago
by saying evicted, i means due to the auto_close_timeout.
#6 Updated by Jos Collin about 6 years ago
- Status changed from New to Need More Info
#7 Updated by Zheng Yan about 6 years ago
dongdong tao wrote:
zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?
No. mds sends session close message to client when evicting client. Client knows its connection is evicted
#8 Updated by dongdong tao about 6 years ago
Zheng Yan wrote:
dongdong tao wrote:
zheng, if a client has been evicted by mds, the client should still think the connection is available,
and when that client send next requst to mds, it certainly will be blocked because it will never get reply from mds, right ?
but from wei's description, looks like the client has made a reconnection when doing next request.I think above is the behavior for libcephfs based client, i'm not sure if kernel client does that too ?
No. mds sends session close message to client when evicting client. Client knows its connection is evicted
sorry, zheng, i try to search the code, couldn't find the code.
i believe the code path is find_idle_sessions->kill_session->journal_close_session. after journal flushed, will call _session_logged. since session state is STATE_KILLING, mds will only mark the connection down and clean the connection.
please correct me if i miss understand some part.
#9 Updated by Zheng Yan about 6 years ago
Sorry. the while the process is:
mds close client connection
client's remote_reset callback gets called
client sends reconnect message to mds
mds denies the reconnect message and sends a session close message to client
#10 Updated by dongdong tao about 6 years ago
Zheng Yan wrote:
Sorry. the while the process is:
mds close client connection
client's remote_reset callback gets called
client sends reconnect message to mds
mds denies the reconnect message and sends a session close message to client
Thanks zheng for you detail explanation. i found the code in kernel
by the way, the behavior of libcephfs based client is different here,
it's not trying to reconnect the mds, just call _closed_mds_session.