Comparing logs I noticed that MDS clock is ~30s behind client. ntpd was dead on one of test servers... Will try to compensate in notes
~08:10:56 MDS decides client is stale:
2017-02-02 08:10:26.538843 7f3b0042f700 10 mds.0.server new stale session client.4151 10.194.0.100:0/4006638222 last 2017-02-02 08:09:21.880119
~08:14:56 client session is closed:
2017-02-02 08:14:26.545466 7f3b0042f700 0 log_channel(cluster) log [INF] : closing stale session client.4151 10.194.0.100:0/4006638222 after 304.665332
2017-02-02 08:14:26.545476 7f3b0042f700 10 mds.0.server autoclosing stale session client.4151 10.194.0.100:0/4006638222 last 2017-02-02 08:09:21.880119
~08:16:10 new session is created, resetsession sent:
2017-02-02 08:15:41.931194 7f3afdb29700 10 mds.client.cephfs new session 0x55c844270680 for client.4151 10.194.0.100:0/4006638222 con 0x55c8445b8180
<..>
2017-02-02 08:15:41.931248 7f3afdb29700 0 -- 10.194.0.189:6816/31695 >> 10.194.0.100:0/4006638222 pipe(0x55c84441e800 sd=18 :6816 s=0 pgs=0 cs=0 l=0 c=0x55c8445b8180).accept we re
set (peer sent cseq 2), sending RESETSESSION
client get's reset session, marks it's session as stale.
2017-02-02 08:16:10.810857 7f8cc4ff9700 0 client.4151 ms_handle_remote_reset on 10.194.0.189:6816/31695
2017-02-02 08:16:10.810869 7f8cc4ff9700 1 client.4151 reset from mds we were open; mark session as stale
08:16:31 client requests for caps, get's ignored because it's session is closed. Happens multiple times.
client:
2017-02-02 08:16:31.016380 7f8ccc5ef700 10 -- 10.194.0.100:0/4006638222 >> 10.194.0.189:6816/31695 pipe(0x5616c998ca10 sd=0 :33785 s=2 pgs=7 cs=1 l=0 c=0x5616c998ad60).reader got ack seq 742216328 >= 742216328 on 0x7f8c9c018040 client_session(request_renewcaps seq 26) v1
MDS:
2017-02-02 08:16:01.932965 7f3b02d35700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 26) v1 from client.4151
2017-02-02 08:16:01.932969 7f3b02d35700 10 mds.0.server ignoring renewcaps on non open|stale session (closed)
Looking at src/client/Client.cc I think that ms_handle_remote_reset() "MetaSession::STATE_OPEN" case of state switch should reopen session instead of marking it stale. I could try making a diff moving case to be together with MetaSession::STATE_OPENING as it looks like it does the same. What do you think? Would session close/open have some other side effects for fuse client in open state?