client: hangs on umount if it had an MDS session evicted
Seen like this with fuse client: * Start 2 active MDSs * Do some activity such that sessions are open with both MDSs * ceph daemon mds.b session evict <client id> * Umount client
Client does this:
2015-02-19 10:20:38.580641 7f0e2b7c87c0 2 client.4121 _close_mds_session mds.0 seq 4 2015-02-19 10:20:38.580657 7f0e2b7c87c0 2 client.4121 _close_mds_session mds.1 seq 3 2015-02-19 10:20:38.580664 7f0e2b7c87c0 2 client.4121 waiting for 2 mds sessions to close 2015-02-19 10:20:38.591696 7f0e227fc700 10 client.4121 handle_client_session client_session(close) v1 from mds.0 2015-02-19 10:20:38.591771 7f0e227fc700 10 client.4121 remove_session_caps mds.0 2015-02-19 10:20:38.591783 7f0e227fc700 10 client.4121 kick_requests_closed for mds.0 2015-02-19 10:20:38.591792 7f0e227fc700 10 client.4121 unmounting: trim pass, size was 0+0 2015-02-19 10:20:38.591794 7f0e227fc700 20 client.4121 trim_cache size 0 max 0 2015-02-19 10:20:38.591796 7f0e227fc700 10 client.4121 unmounting: trim pass, size still 0+0 2015-02-19 10:20:38.591801 7f0e2b7c87c0 2 client.4121 waiting for 1 mds sessions to close
But the MDS where we evicted the session ignored it:
2015-02-19 10:20:36.922299 7f2266057700 1 -- 172.16.79.251:6813/35691 <== client.4120 172.16.79.251:0/35776 2093931643 ==== client_session(request_close seq 2) v1 ==== 28+0+0 (817140596 0 0) 0x4680000 con 0x4a27e80 2015-02-19 10:20:36.922319 7f2266057700 20 mds.1.server get_session have 0x4a7a700 client.4120 172.16.79.251:0/35776 state closed 2015-02-19 10:20:36.922323 7f2266057700 3 mds.1.server handle_client_session client_session(request_close seq 2) v1 from client.4120 2015-02-19 10:20:36.922326 7f2266057700 10 mds.1.server already closed|closing|killing, dropping this req
I suppose we should be always acknowledging client request_close messages, so that the client can terminate itself.
#1 Updated by Greg Farnum about 4 years ago
Mmmm, that should be a pretty easy change MDS-side; I'm trying to figure out if it could get us in trouble though. And do we really want the client to be clean if we evicted it? There's probably going to be dirty data...Actually I think if there is dirty data it will block on that rather than not getting a close.
On the other hand, we might also want the client to be able to shut down happily if a server or the network goes away but it has no dirty data. I don't think there's much harm cluster-side to the client disappearing in that case, so maybe it should time out the close session request and just exit?