Jewel ceph-fuse does not recover after lost connection to MDS
After ceph-fuse loses connection to MDS for few minutes, it does not recover - accessing mountpoint hangs processes.
Replicated using 10.2.3 and 10.2.5 clients. Didn't try on Kraken.
periodically poke mount:
while [ 1 ]; do date; ls -l cephfs/test ; sleep 10; done
then firewall connection to MDS:
iptables -A INPUT -p tcp --dport 6816 -j DROP
after pokes stop wait few minutes (5?). If we wait for too short time, it recovers, if long enough it hangs. Attaching client logs with enabled debug options:
debug_mds = 20/20
debug_mds_balancer = 20/20
debug_mds_log = 20/20
debug_objecter = 20/20
debug_rados = 20/20
debug_client = 20/20
debug_ms = 20/20
debug_fuse = 20/20
and a thread backtrace.
#1 Updated by John Spray about 2 years ago
- Status changed from New to Rejected
Clients which lose connectivity to the MDS are evicted after a timeout given by the "mds session timeout" setting. Evicting a client may cause the client to get stuck, as it can't necessarily get back into a healthy state.
Presumably you are reproducing this artificially because you encountered it in practice somewhere; the solution in practice is to either increase that interval, or fix whatever is causing the loss of contact to begin with.
There are changes coming in the future that will make this functionality a bit friendlier to make the client cope more gracefully with being evicted, only evict clients when they're holding up other clients.
#2 Updated by Henrik Korkuc about 2 years ago
Yes, I am reproducing it artificially after short network blip caused permanent mount point hangs of multiple servers, sending load avg of these servers to the sky by periodic mountpoint stats (monitoring system).
If ceph-fuse cannot handle session reset and only way to recover is to manually SIGKILL ceph-fuse and remount, why doesn't ceph-fuse just kill itself? In my opinion erroring out a user would be much more friendlier than just hanging it permanently.
#3 Updated by John Spray about 2 years ago
- Status changed from Rejected to New
Hmm, now that I actually read the log (like a reasonable person :-)) it is a little bit strange that the server is sending the client client_session(stale) messages (not close messages) but then apparently dropping the client's getattrs.
Could you upload the MDS log from the same period, and mark the times at which the connection was interrupted and then restored?
#4 Updated by Henrik Korkuc about 2 years ago
Attaching client and MDS logs. This time I was mounting from another server and firewalling both input and output to make sure everything is isolated. Final outcome didn't change
#6 Updated by Henrik Korkuc about 2 years ago
Comparing logs I noticed that MDS clock is ~30s behind client. ntpd was dead on one of test servers... Will try to compensate in notes
~08:10:56 MDS decides client is stale:
2017-02-02 08:10:26.538843 7f3b0042f700 10 mds.0.server new stale session client.4151 10.194.0.100:0/4006638222 last 2017-02-02 08:09:21.880119
~08:14:56 client session is closed:
2017-02-02 08:14:26.545466 7f3b0042f700 0 log_channel(cluster) log [INF] : closing stale session client.4151 10.194.0.100:0/4006638222 after 304.665332 2017-02-02 08:14:26.545476 7f3b0042f700 10 mds.0.server autoclosing stale session client.4151 10.194.0.100:0/4006638222 last 2017-02-02 08:09:21.880119
~08:16:10 new session is created, resetsession sent:
2017-02-02 08:15:41.931194 7f3afdb29700 10 mds.client.cephfs new session 0x55c844270680 for client.4151 10.194.0.100:0/4006638222 con 0x55c8445b8180 <..> 2017-02-02 08:15:41.931248 7f3afdb29700 0 -- 10.194.0.189:6816/31695 >> 10.194.0.100:0/4006638222 pipe(0x55c84441e800 sd=18 :6816 s=0 pgs=0 cs=0 l=0 c=0x55c8445b8180).accept we re set (peer sent cseq 2), sending RESETSESSION
client get's reset session, marks it's session as stale.
2017-02-02 08:16:10.810857 7f8cc4ff9700 0 client.4151 ms_handle_remote_reset on 10.194.0.189:6816/31695 2017-02-02 08:16:10.810869 7f8cc4ff9700 1 client.4151 reset from mds we were open; mark session as stale
08:16:31 client requests for caps, get's ignored because it's session is closed. Happens multiple times.
2017-02-02 08:16:31.016380 7f8ccc5ef700 10 -- 10.194.0.100:0/4006638222 >> 10.194.0.189:6816/31695 pipe(0x5616c998ca10 sd=0 :33785 s=2 pgs=7 cs=1 l=0 c=0x5616c998ad60).reader got ack seq 742216328 >= 742216328 on 0x7f8c9c018040 client_session(request_renewcaps seq 26) v1
2017-02-02 08:16:01.932965 7f3b02d35700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 26) v1 from client.4151 2017-02-02 08:16:01.932969 7f3b02d35700 10 mds.0.server ignoring renewcaps on non open|stale session (closed)
Looking at src/client/Client.cc I think that ms_handle_remote_reset() "MetaSession::STATE_OPEN" case of state switch should reopen session instead of marking it stale. I could try making a diff moving case to be together with MetaSession::STATE_OPENING as it looks like it does the same. What do you think? Would session close/open have some other side effects for fuse client in open state?
#9 Updated by Henrik Korkuc about 2 years ago
I updated PR to do _closed_mds_session(s).
As for config option, I would expect client to reconnect automagically after connection loss (it is self-healing after all).
On the other hand, I can do config option too if it is considered a better way to handle this change.