Project

General

Profile

Bug #56435

octopus: cluster [WRN] evicting unresponsive client smithi115: (124139), after waiting 49.5313 seconds during MDS startup

Added by Xiubo Li over 1 year ago. Updated over 1 year ago.

Status:
Triaged
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2022-06-24_16:58:58-rados-wip-yuri-testing-2022-06-24-0817-octopus-distro-default-smithi/6897870

The unresponsive client was from mgr.y:

2022-06-24T19:58:11.040+0000 7f6f01e98700  0 [volumes DEBUG mgr_util] CephFS mounting...
2022-06-24T19:58:11.040+0000 7f6f2d3c1700  1 --1-  >> v1:172.21.15.115:6789/0 conn(0x561e646a7c00 0x561e6473e800 :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=0).handle_server_banner_and_identify peer v1:172.21.15.115:6789/0 says I am v1:172.21.15.115:37556/0 (socket says 172.21.15.115:37556)
2022-06-24T19:58:11.040+0000 7f6f2d3c1700  1 -- 172.21.15.115:0/560103558 learned_addr learned my addr 172.21.15.115:0/560103558 (peer_addr_for_me v1:172.21.15.115:0/0)

History

#1 Updated by Xiubo Li over 1 year ago

The clients have been unregistered at 2022-06-24T20:00:11:

2022-06-24T20:00:11.038+0000 7f6f02699700  7 mgr unregister_client unregistering msgr client handle v2:172.21.15.115:0/560103558
2022-06-24T20:00:11.038+0000 7f6effe94700  1 -- 172.21.15.115:0/1833290781 shutdown_connections
2022-06-24T20:00:11.038+0000 7f6effe94700  1 -- 172.21.15.115:0/1833290781 wait complete.
2022-06-24T20:00:11.038+0000 7f6effe94700  7 mgr unregister_client unregistering msgr client handle v2:172.21.15.115:0/1833290781
2022-06-24T20:00:11.042+0000 7f6ef2639700 10 cephx: validate_tickets want 53 have 53 need 0
2022-06-24T20:00:11.042+0000 7f6ef2639700 20 cephx client: need_tickets: want=53 have=53 need=0
2022-06-24T20:00:11.042+0000 7f6ef2639700 10 auth: dump_rotating:

But the mds.a was trying to reconnect them *39 seconds later:

2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.a my gid is 124169
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.a map says I am mds.0.8 state up:reconnect
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.a msgr says I am v1:172.21.15.31:6816/4221084591
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.a handle_mds_map: handling map as rank 0
2022-06-24T20:00:51.329+0000 7f557885b700  1 mds.0.8 handle_mds_map i am now mds.0.8
2022-06-24T20:00:51.329+0000 7f557885b700  1 mds.0.8 handle_mds_map state change up:replay --> up:reconnect
2022-06-24T20:00:51.329+0000 7f557885b700  1 mds.0.8 reconnect_start
2022-06-24T20:00:51.329+0000 7f557885b700  1 mds.0.8 reopen_log
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.cache rollback_uncommitted_fragments: 0 pending
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.server apply_blacklist: killed 0
2022-06-24T20:00:51.329+0000 7f557885b700  4 mds.0.8 apply_blacklist: killed 0 blacklisted sessions (2 blacklist entries, 2)
2022-06-24T20:00:51.329+0000 7f557885b700  1 mds.0.server reconnect_clients -- 2 sessions
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.sessionmap dump
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.sessionmap client.124142 0x5629aa274a00 state open completed {} prealloc_inos [] delegated_inos [] used_inos []
2022-06-24T20:00:51.329+0000 7f557885b700 10 mds.0.sessionmap client.124139 0x5629aa274500 state open completed {} prealloc_inos [] delegated_inos [] used_inos []

Not sure why the mds.a was still holding the unregistered client sessions.

#2 Updated by Venky Shankar over 1 year ago

  • Status changed from New to Triaged
  • Assignee set to Venky Shankar

Also available in: Atom PDF