Bug #56435
octopus: cluster [WRN] evicting unresponsive client smithi115: (124139), after waiting 49.5313 seconds during MDS startup
% Done:
0%
ceph-qa-suite:
Component(FS):
Labels (FS):
qa
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/yuriw-2022-06-24_16:58:58-rados-wip-yuri-testing-2022-06-24-0817-octopus-distro-default-smithi/6897870
The unresponsive client was from mgr.y:
2022-06-24T19:58:11.040+0000 7f6f01e98700 0 [volumes DEBUG mgr_util] CephFS mounting... 2022-06-24T19:58:11.040+0000 7f6f2d3c1700 1 --1- >> v1:172.21.15.115:6789/0 conn(0x561e646a7c00 0x561e6473e800 :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=0).handle_server_banner_and_identify peer v1:172.21.15.115:6789/0 says I am v1:172.21.15.115:37556/0 (socket says 172.21.15.115:37556) 2022-06-24T19:58:11.040+0000 7f6f2d3c1700 1 -- 172.21.15.115:0/560103558 learned_addr learned my addr 172.21.15.115:0/560103558 (peer_addr_for_me v1:172.21.15.115:0/0)
History
#1 Updated by Xiubo Li over 1 year ago
The clients have been unregistered at 2022-06-24T20:00:11:
2022-06-24T20:00:11.038+0000 7f6f02699700 7 mgr unregister_client unregistering msgr client handle v2:172.21.15.115:0/560103558 2022-06-24T20:00:11.038+0000 7f6effe94700 1 -- 172.21.15.115:0/1833290781 shutdown_connections 2022-06-24T20:00:11.038+0000 7f6effe94700 1 -- 172.21.15.115:0/1833290781 wait complete. 2022-06-24T20:00:11.038+0000 7f6effe94700 7 mgr unregister_client unregistering msgr client handle v2:172.21.15.115:0/1833290781 2022-06-24T20:00:11.042+0000 7f6ef2639700 10 cephx: validate_tickets want 53 have 53 need 0 2022-06-24T20:00:11.042+0000 7f6ef2639700 20 cephx client: need_tickets: want=53 have=53 need=0 2022-06-24T20:00:11.042+0000 7f6ef2639700 10 auth: dump_rotating:
But the mds.a was trying to reconnect them *39 seconds later:
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.a my gid is 124169 2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.a map says I am mds.0.8 state up:reconnect 2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.a msgr says I am v1:172.21.15.31:6816/4221084591 2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.a handle_mds_map: handling map as rank 0 2022-06-24T20:00:51.329+0000 7f557885b700 1 mds.0.8 handle_mds_map i am now mds.0.8 2022-06-24T20:00:51.329+0000 7f557885b700 1 mds.0.8 handle_mds_map state change up:replay --> up:reconnect 2022-06-24T20:00:51.329+0000 7f557885b700 1 mds.0.8 reconnect_start 2022-06-24T20:00:51.329+0000 7f557885b700 1 mds.0.8 reopen_log 2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.cache rollback_uncommitted_fragments: 0 pending 2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.server apply_blacklist: killed 0 2022-06-24T20:00:51.329+0000 7f557885b700 4 mds.0.8 apply_blacklist: killed 0 blacklisted sessions (2 blacklist entries, 2) 2022-06-24T20:00:51.329+0000 7f557885b700 1 mds.0.server reconnect_clients -- 2 sessions 2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.sessionmap dump 2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.sessionmap client.124142 0x5629aa274a00 state open completed {} prealloc_inos [] delegated_inos [] used_inos [] 2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.sessionmap client.124139 0x5629aa274500 state open completed {} prealloc_inos [] delegated_inos [] used_inos []
Not sure why the mds.a was still holding the unregistered client sessions.
#2 Updated by Venky Shankar over 1 year ago
- Status changed from New to Triaged
- Assignee set to Venky Shankar