Bug #56116
closedmds: handle deferred client request core when mds reboot
100%
Description
When mds reboot, client will send `mds_requests` and `client_reconnect` to mds.
If mds does not receive the `client_reconnect` message within `mds_reconnect_timeout`, mds will kill client session and go to next phase (reconnect -> rejoin).
And mds will handle these received client requests when mds' state change is active.
But if MDCache is not ready, these messages will be pushed into mdcache->waiting_for_root queue.
Back to the client, the client will try to rebuild the session with mds even if mds already kill the old session (client still has unfinished mds_requests), so the client will send request_open to mds.
If mds handle this client session message before mdcache is ready, the new session will be added to mds' sessionmap.
Now if mdcache is ready, mds will get the crash because mds mistook the client request for a new session with an imported session
1: (()+0xf100) [0x7f1033c30100] 2: (Mutex::lock(bool)+0x9) [0x7f1035e33cf9] 3: (MDSRank::get_session(boost::intrusive_ptr<Message const> const&)+0x92a) [0x7f103ee8f27a] 4: (Server::handle_client_request(boost::intrusive_ptr<MClientRequest const> const&)+0x504) [0x7f103ef0c9c4] 5: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x122) [0x7f103ef18162] 6: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message const> const&)+0x6dc) [0x7f103ee8be8c] 7: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x7fa) [0x7f103ee8e2fa] 8: (MDSRank::retry_dispatch(boost::intrusive_ptr<Message const> const&)+0x12) [0x7f103ee8e942] 9: (MDSContext::complete(int)+0x74) [0x7f103f0ff4b4]
2022-06-07T03:50:44.372+0800 7fffe6b2c700 5 mds.beacon.a set_want_state: up:replay -> up:reconnect 2022-06-07T03:50:46.524+0800 7fffee33b700 3 mds.0.server not active yet, waiting 2022-06-07T03:50:53.860+0800 7fffecb38700 10 mds.0.server kill_session 0x55555b4e2300 2022-06-07T03:50:53.860+0800 7fffecb38700 5 mds.beacon.a set_want_state: up:reconnect -> up:rejoin 2022-06-07T03:50:54.869+0800 7fffee33b700 5 mds.beacon.a set_want_state: up:rejoin -> up:active 2022-06-07T03:51:19.690+0800 7fffee33b700 10 MDSContext::complete: 18C_MDS_RetryMessage 2022-06-07T03:51:19.690+0800 7fffee33b700 5 mds.0.server waiting for root 2022-06-07T03:51:19.915+0800 7fffee33b700 10 mds.0.sessionmap add_session s=0x55555b57e000 name=client.4445 2022-06-07T03:51:38.962+0800 7fffe832f700 10 mds.0.cache populate_mydir done 2022-06-07T03:51:38.962+0800 7fffe9b32700 10 MDSContext::complete: 18C_MDS_RetryMessage 2022-06-07T03:51:38.962+0800 7fffe9b32700 10 mds.0.215 get_session replacing connection bootstrap session 0x55555b4e2300 with imported session 0x55555b57e000