It's the connection cleanup thread that's waiting on cephfs shutdown() call after initiating a session disconnect:
2020-02-25 09:24:51.854 7f85ed4bd700 4 mgr[volumes] disconnecting from cephfs 'cephfs'
2020-02-25 09:24:51.854 7f85ed4bd700 1 -- 172.21.15.121:0/959936343 --> [v2:172.21.15.121:6834/987573718,v1:172.21.15.121:6835/987573
718] -- client_session(request_close seq 11) v3 -- 0x56404a91bd40 con 0x56404ac9bc00
Ideally, there should have been messages exchanged between the manager and mds for connection termination:
2020-02-25 09:08:51.835 7f85ed4bd700 4 mgr[volumes] disconnecting from cephfs 'cephfs'
2020-02-25 09:08:51.835 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 --> [v2:172.21.15.121:6834/1785088858,v1:172.21.15.121:6835/1785088858] -- client_session(request_close seq 120) v3 -- 0x56404ae68480 con 0x56405498ec00
2020-02-25 09:08:51.839 7f859dbde700 1 -- 172.21.15.121:0/593434532 <== mds.0 v2:172.21.15.121:6834/1785088858 314 ==== client_session(close) v1 ==== 28+0+0 (crc 0 0 0) 0x56404ae68480 con 0x56405498ec00
2020-02-25 09:08:51.839 7f859dbde700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6834/1785088858,v1:172.21.15.121:6835/1785088858] conn(0x56405498ec00 msgr2=0x56405495f600 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down
2020-02-25 09:08:51.839 7f859dbde700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6834/1785088858,v1:172.21.15.121:6835/1785088858] conn(0x56405498ec00 0x56405495f600 crc :-1 s=READY pgs=16 cs=0 l=0 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.46:6808/26678,v1:172.21.15.46:6809/26678] conn(0x56405498f000 msgr2=0x564054960100 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.46:6808/26678,v1:172.21.15.46:6809/26678] conn(0x56405498f000 0x564054960100 crc :-1 s=READY pgs=70 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.46:6800/26680,v1:172.21.15.46:6801/26680] conn(0x56404cde5c00 msgr2=0x564054979700 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.46:6800/26680,v1:172.21.15.46:6801/26680] conn(0x56404cde5c00 0x564054979700 crc :-1 s=READY pgs=56 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.46:6816/26679,v1:172.21.15.46:6817/26679] conn(0x56404ce3fc00 msgr2=0x564054978100 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.46:6816/26679,v1:172.21.15.46:6817/26679] conn(0x56404ce3fc00 0x564054978100 crc :-1 s=READY pgs=56 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f86161a6700 1 -- 172.21.15.121:0/593434532 reap_dead start
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6802/27401,v1:172.21.15.121:6803/27401] conn(0x56404cd54800 msgr2=0x564054979180 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6802/27401,v1:172.21.15.121:6803/27401] conn(0x56404cd54800 0x564054979180 crc :-1 s=READY pgs=62 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f86161a6700 1 -- 172.21.15.121:0/593434532 reap_dead start
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6810/27402,v1:172.21.15.121:6811/27402] conn(0x564054962800 msgr2=0x564054a04000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6810/27402,v1:172.21.15.121:6811/27402] conn(0x564054962800 0x564054a04000 crc :-1 s=READY pgs=60 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6817/27403,v1:172.21.15.121:6821/27403] conn(0x56404c74b000 msgr2=0x564054978c00 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6817/27403,v1:172.21.15.121:6821/27403] conn(0x56404c74b000 0x564054978c00 crc :-1 s=READY pgs=59 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6813/27404,v1:172.21.15.121:6815/27404] conn(0x56404ce3e000 msgr2=0x564054978680 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6813/27404,v1:172.21.15.121:6815/27404] conn(0x56404ce3e000 0x564054978680 crc :-1 s=READY pgs=56 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:3300/0,v1:172.21.15.121:6789/0] conn(0x5640556c2400 msgr2=0x56405495fb80 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:3300/0,v1:172.21.15.121:6789/0] conn(0x5640556c2400 0x56405495fb80 crc :-1 s=READY pgs=3142 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 shutdown_connections
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6810/27402,v1:172.21.15.121:6811/27402] conn(0x564054962800 0x564054a04000 unknown :-1 s=CLOSED pgs=60 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6813/27404,v1:172.21.15.121:6815/27404] conn(0x56404ce3e000 0x564054978680 unknown :-1 s=CLOSED pgs=56 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:3300/0,v1:172.21.15.121:6789/0] conn(0x5640556c2400 0x56405495fb80 unknown :-1 s=CLOSED pgs=3142 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 --2- 172.21.15.121:0/593434532 >> [v2:172.21.15.121:6817/27403,v1:172.21.15.121:6821/27403] conn(0x56404c74b000 0x564054978c00 unknown :-1 s=CLOSED pgs=59 cs=0 l=1 rx=0 tx=0).stop
2020-02-25 09:08:51.840 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 shutdown_connections
2020-02-25 09:08:51.841 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 wait complete.
2020-02-25 09:08:51.841 7f85ed4bd700 1 -- 172.21.15.121:0/593434532 >> 172.21.15.121:0/593434532 conn(0x564054963000 msgr2=0x564054959000 unknown :-1 s=STATE_NONE l=0).mark_down
The cleanup thread holds the conn pool lock so commands are blocked until shutdown returns. There's no exception logged and cephfs shutdown is a blocking call, so, pretty much everything in mgr/volumes comes to a standstill.
On the MDS side, mds.b (to which the client has a session), respawned:
(note that this is just before the client initiates session close)
2020-02-25 09:24:50.993 7f3582f29700 10 mds.b my gid is 14360
2020-02-25 09:24:50.993 7f3582f29700 10 mds.b map says I am mds.-1.-1 state null
2020-02-25 09:24:50.993 7f3582f29700 10 mds.b msgr says i am [v2:172.21.15.121:6834/987573718,v1:172.21.15.121:6835/987573718]
2020-02-25 09:24:50.993 7f3582f29700 1 mds.b Map removed me (mds.-1 gid:14360) from cluster due to lost contact; respawning
2020-02-25 09:24:50.993 7f3582f29700 1 mds.b respawn!
Post respawn, the only log entries for client "172.21.15.121:0/959936343" are below which is just after the dump of events.
2020-02-25 09:24:50.993 7f3585f2f700 1 -- [v2:172.21.15.121:6834/987573718,v1:172.21.15.121:6835/987573718] >> 172.21.15.121:0/959936343 conn(0x5613c0689400 msgr2=0x5613c0451700 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 35
2020-02-25 09:24:50.993 7f3585f2f700 1 -- [v2:172.21.15.121:6834/987573718,v1:172.21.15.121:6835/987573718] >> 172.21.15.121:0/959936343 conn(0x5613c0689400 msgr2=0x5613c0451700 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2020-02-25 09:24:50.993 7f3585f2f700 1 --2- [v2:172.21.15.121:6834/987573718,v1:172.21.15.121:6835/987573718] >> 172.21.15.121:0/959936343 conn(0x5613c0689400 0x5613c0451700 crc :-1 s=READY pgs=4 cs=0 l=0 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2020-02-25 09:24:50.993 7f3585f2f700 1 --2- [v2:172.21.15.121:6834/987573718,v1:172.21.15.121:6835/987573718] >> 172.21.15.121:0/959936343 conn(0x5613c0689400 0x5613c0451700 unknown :-1 s=READY pgs=4 cs=0 l=0 rx=0 tx=0)._fault with nothing to send, going to standby
mds.b goes through boot sequence boot->standby->creating->active and enters normal operation. mds.b never saw "request_close" message from this client.