Project

General

Profile

Bug #53911

client: client session state stuck in opening and hang all the time

Added by Ivan Guan 11 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

100%

Source:
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/nautilus-x
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Version: 14.2.5

MDS: single MDS

Description:
Recently,there has an inconsistent session state between the client and the server in my environment.
Client is opening and mds is open, so the request can’t send to mds and report “waiting for session
to mds.0 to open”.

Service Log:
client log:
2022-01-08 00:30:27.947098 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 0x7ff965ea4b70 crc :-1 s=SESSION_RECONNECTING pgs=6811 cs=1 l=0 rx=0 tx=0).handle_session_reset received session reset full=1
2022-01-08 00:30:27.947105 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 0x7ff965ea4b70 crc :-1 s=SESSION_RECONNECTING pgs=6811 cs=1 l=0 rx=0 tx=0).reset_session

2022-01-08 00:30:27.947188 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 0x7ff965ea4b70 crc :-1 s=READY pgs=6812 cs=0 l=0 rx=0 tx=0).ready entity=mds.0 client_cookie=8675584eea1e494d server_cookie=2d65b720ac1d42c2 in_seq=0 out_seq=0
2022-01-08 00:30:27.949789 7ffa0affd700 1 -- 10.61.152.103:0/258451310 --> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] -- client_session(request_open) v3 -- 0x7ffa104f24e0 con 0x7ff955671f00

2022-01-08 00:30:27.949857 7ffa0affd700 10 client.74577766.objecter ms_handle_connect 0x7ff9bd1a2e30
2022-01-08 00:30:27.949861 7ffa0affd700 10 client.74577766 ms_handle_connect on v2:10.61.152.1:6960/3912678747
2022-01-08 00:30:27.949875 7ffa0affd700 10 client.74577766.objecter ms_handle_connect 0x7ff955671f00
2022-01-08 00:30:27.949876 7ffa0affd700 10 client.74577766 ms_handle_connect on v2:10.61.152.1:6960/3912678747

mds: handle request open and send reply to client
2022-01-08 00:31:45.963681 7fbf1d7e0700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 crc :-1 s=THROTTLE_DONE pgs=745 cs=0 l=0 rx=0 tx=0).handle_message got 314 + 0 + 0 byte message. envelope type=22 src client.74577766 off 0
2022-01-08 00:31:45.963731 7fbf18fd7700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] <== client.74577766 10.61.152.103:0/258451310 1 ==== client_session(request_open) v3 ==== 314+0+0 (crc 0 0 0) 0x7fc0b1f298c0 con 0x7fc0717c4800 queue size / msize 0/314
2022-01-08 00:31:45.963748 7fbf18fd7700 3 mds.0.server handle_client_session client_session(request_open) v3 from client.74577766
2022-01-08 00:31:45.963765 7fbf18fd7700 10 mds.0.sessionmap add_session s=0x7fc03f240000 name=client.74577766
2022-01-08 00:31:45.963774 7fbf18fd7700 10 mds.0.sessionmap touch_session s=0x7fc03f240000 name=client.74577766
2022-01-08 00:31:45.963826 7fbf10fc7700 5 mds.0.log _submit_thread 893993519170~385 : ESession client.74577766 10.61.152.103:0/258451310 open cmapv 22300515

2022-01-08 00:31:45.965994 7fbf127ca700 10 mds.0.server _session_logged client.74577766 10.61.152.103:0/258451310 state_seq 1 open 22300515
2022-01-08 00:31:45.966005 7fbf127ca700 10 mds.0.sessionmap touch_session s=0x7fc03f240000 name=client.74577766
2022-01-08 00:31:45.966011 7fbf127ca700 10 mds.0.182824 send_message_client client.74577766 10.61.152.103:0/258451310 client_session(open) v3
2022-01-08 00:31:45.966032 7fbf127ca700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] --> client.7457776610.61.152.103:0/258451310 -- client_session(open) v3 -- 0x7fc0b0971b00 con 0x7fc0717c4800
2022-01-08 00:31:45.966054 7fbf127ca700 2 mds.0.server New client session: addr="10.61.152.103:0/258451310",elapsed=0.002368,throttled=0.000025,status="ACCEPTED",root=“/"

client log: did not receive request open reply from mds because the connection has been marked down
2022-01-08 00:30:27.949880 7ffa0affd700 0 client.74577766 ms_handle_remote_reset on v2:10.61.152.1:6960/3912678747
2022-01-08 00:30:27.983376 7ffa0affd700 1 client.74577766 reset from mds we were opening; retrying
2022-01-08 00:30:27.983379 7ffa0affd700 5 client.74577766 _closed_mds_session mds.0 seq 0
2022-01-08 00:30:27.983381 7ffa0affd700 1 -- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 msgr2=0x7ff965ea4b70 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down
2022-01-08 00:30:27.983397 7ffa0affd700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 0x7ff965ea4b70 crc :-1 s=READY pgs=6812 cs=0 l=0 rx=0 tx=0).stop
2022-01-08 00:30:27.983429 7ffa0affd700 10 client.74577766 remove_session_caps mds.0
2022-01-08 00:30:27.983431 7ffa0affd700 10 client.74577766 kick_requests_closed for mds.0
#send new connect to mds
2022-01-08 00:30:28.033498 7ffa0affd700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff966245b00 0x7ff966088fc0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).connect
2022-01-08 00:30:28.033949 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff966245b00 0x7ff966088fc0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0

mds log:
2022-01-08 00:31:45.984781 7fbf1d7e0700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 msgr2=0x7fc0bfadf700 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 67
2022-01-08 00:31:45.984792 7fbf1d7e0700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 msgr2=0x7fc0bfadf700 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2022-01-08 00:31:45.984795 7fbf1d7e0700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 crc :-1 s=READY pgs=745 cs=0 l=0 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2022-01-08 00:31:45.984837 7fbf1d7e0700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 unknown :-1 s=READY pgs=745 cs=0 l=0 rx=0 tx=0)._fault with nothing to send, going to standby

#received new connect request, trigger reusing old connection
2022-01-08 00:31:46.035100 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> unknown.0 conn(0x7fbf9e888000 0x7fc046fbf600 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept
2022-01-08 00:31:46.035193 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> unknown.0 conn(0x7fbf9e888000 0x7fc046fbf600 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2022-01-08 00:31:46.035299 7fbf1f7e4700 10 mds.xt1 parse_caps: parsing auth_cap_str='allow *'
2022-01-08 00:31:46.035407 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fbf9e888000 0x7fc046fbf600 crc :-1 s=SESSION_ACCEPTING pgs=746 cs=0 l=0 rx=0 tx=0).handle_existing_connection found previous session existing=0x7fc0717c4800, peer must have reseted.
2022-01-08 00:31:46.035418 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 unknown :-1 s=STANDBY pgs=745 cs=0 l=0 rx=0 tx=0).reset_session
#new connection has been marked down
2022-01-08 00:31:46.035428 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fbf9e888000 0x7fc046fbf600 crc :-1 s=SESSION_ACCEPTING pgs=746 cs=0 l=0 rx=0 tx=0).stop
2022-01-08 00:31:46.035434 7fbf18fd7700 5 mds.xt1 ms_handle_remote_reset on 10.61.152.103:0/258451310
2022-01-08 00:31:46.035445 7fbf18fd7700 5 mds.xt1 ms_handle_reset on 10.61.152.103:0/258451310
2022-01-08 00:31:46.035447 7fbf18fd7700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fbf9e888000 msgr2=0x7fc046fbf600 unknown :-1 s=STATE_CLOSED l=0).mark_down
2022-01-08 00:31:46.035454 7fbf18fd7700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fbf9e888000 0x7fc046fbf600 unknown :-1 s=CLOSED pgs=746 cs=0 l=0 rx=0 tx=0).stop
2022-01-08 00:31:46.035460 7fbf18fd7700 10 mds.xt1 existing session 0x7fc03f240000 for client.74577766 10.61.152.103:0/258451310 existing con 0x7fc0717c4800, new/authorizing con 0x7fc0717c4800

2022-01-08 00:31:46.035473 7fbf18fd7700 10 mds.xt1 ms_handle_accept 10.61.152.103:0/258451310 con 0x7fc0717c4800 session 0x7fc03f240000
2022-01-08 00:31:46.035471 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 crc :-1 s=READY pgs=746 cs=0 l=0 rx=0 tx=0).ready entity=client.74577766 client_cookie=c326e3fb4feab038 server_cookie=a9b0e8ac9436429d in_seq=0 out_seq=0

client log:sent request_open to mds again
2022-01-08 00:30:28.034228 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff966245b00 0x7ff966088fc0 crc :-1 s=READY pgs=6813 cs=0 l=0 rx=0 tx=0).ready entity=mds.0 client_cookie=c326e3fb4feab038 server_cookie=a9b0e8ac9436429d in_seq=0 out_seq=0
022-01-08 00:30:28.047087 7ffa0affd700 1 -- 10.61.152.103:0/258451310 --> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] -- client_session(request_open) v3 -- 0x7ffa12757520 con 0x7ff966245b00
2022-01-08 00:30:28.047130 7ffa0affd700 10 client.74577766.objecter ms_handle_connect 0x7ff955671f00
2022-01-08 00:30:28.047134 7ffa0affd700 10 client.74577766 ms_handle_connect on v2:10.61.152.1:6960/3912678747
2022-01-08 00:30:28.047147 7ffa0affd700 10 client.74577766.objecter ms_handle_connect 0x7ff966245b00
2022-01-08 00:30:28.047148 7ffa0affd700 10 client.74577766 ms_handle_connect on v2:10.61.152.1:6960/3912678747

mds log: dropped the request because its state was open already
2022-01-08 00:31:46.048538 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 crc :-1 s=THROTTLE_DONE pgs=746 cs=0 l=0 rx=0 tx=0).handle_message got 314 + 0 + 0 byte message. envelope type=22 src client.74577766 off 0
2022-01-08 00:31:46.048636 7fbf18fd7700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] <== client.74577766 10.61.152.103:0/258451310 1 ==== client_session(request_open) v3 ==== 314+0+0 (crc 0 0 0) 0x7fc0097b4240 con 0x7fc0717c4800 queue size / msize 0/314
2022-01-08 00:31:46.048668 7fbf18fd7700 3 mds.0.server handle_client_session client_session(request_open) v3 from client.74577766
2022-01-08 00:31:46.048673 7fbf18fd7700 10 mds.0.server currently open|opening|stale|killing, dropping this req

client log: failed to send request to mds
2022-01-08 00:31:03.767331 7ff9a7fff700 8 client.74577766 _ll_forget 0x1 1
2022-01-08 00:31:03.798409 7ffa01ffb700 8 client.74577766 _ll_getattr 0x1.head
2022-01-08 00:31:03.798426 7ffa01ffb700 10 client.74577766 _getattr mask pAsLsXsFs issued=0
2022-01-08 00:31:03.805751 7ffa01ffb700 10 client.74577766 did not get mds through better means, so chose random mds 0
2022-01-08 00:31:03.805758 7ffa01ffb700 10 client.74577766 waiting for session to mds.0 to open


Related issues

Copied to CephFS - Backport #54216: quincy: client: client session state stuck in opening and hang all the time Resolved
Copied to CephFS - Backport #54217: pacific: client: client session state stuck in opening and hang all the time Resolved

History

#1 Updated by Ivan Guan 11 months ago

It seems like that, if the client missed the request_open reply message, it will never be recovered from STATE::OPENING
to STATE::OPEN. Thus, the client will not has any response from mds except renew caps reply.

#3 Updated by Venky Shankar 11 months ago

  • Project changed from RADOS to CephFS
  • Target version changed from v14.2.23 to v17.0.0
  • Backport set to quincy, pacific
  • Pull request ID set to 44638
  • Component(FS) MDS added

#4 Updated by Venky Shankar 10 months ago

Formatting for readability

Version: 14.2.5

MDS: single MDS

Description:
Recently,there has an inconsistent session state between the client and the server in my environment.
Client is opening and mds is open, so the request can’t send to mds and report “waiting for session
to mds.0 to open”.

Service Log:
client log:

2022-01-08 00:30:27.947098 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 0x7ff965ea4b70 crc :-1 s=SESSION_RECONNECTING pgs=6811 cs=1 l=0 rx=0 tx=0).handle_session_reset received session reset full=1
2022-01-08 00:30:27.947105 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 0x7ff965ea4b70 crc :-1 s=SESSION_RECONNECTING pgs=6811 cs=1 l=0 rx=0 tx=0).reset_session

2022-01-08 00:30:27.947188 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 0x7ff965ea4b70 crc :-1 s=READY pgs=6812 cs=0 l=0 rx=0 tx=0).ready entity=mds.0 client_cookie=8675584eea1e494d server_cookie=2d65b720ac1d42c2 in_seq=0 out_seq=0
2022-01-08 00:30:27.949789 7ffa0affd700 1 -- 10.61.152.103:0/258451310 --> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] -- client_session(request_open) v3 -- 0x7ffa104f24e0 con 0x7ff955671f00

2022-01-08 00:30:27.949857 7ffa0affd700 10 client.74577766.objecter ms_handle_connect 0x7ff9bd1a2e30
2022-01-08 00:30:27.949861 7ffa0affd700 10 client.74577766 ms_handle_connect on v2:10.61.152.1:6960/3912678747
2022-01-08 00:30:27.949875 7ffa0affd700 10 client.74577766.objecter ms_handle_connect 0x7ff955671f00
2022-01-08 00:30:27.949876 7ffa0affd700 10 client.74577766 ms_handle_connect on v2:10.61.152.1:6960/3912678747

mds: handle request open and send reply to client

2022-01-08 00:31:45.963681 7fbf1d7e0700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 crc :-1 s=THROTTLE_DONE pgs=745 cs=0 l=0 rx=0 tx=0).handle_message got 314 + 0 + 0 byte message. envelope type=22 src client.74577766 off 0
2022-01-08 00:31:45.963731 7fbf18fd7700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] <== client.74577766 10.61.152.103:0/258451310 1 ==== client_session(request_open) v3 ==== 314+0+0 (crc 0 0 0) 0x7fc0b1f298c0 con 0x7fc0717c4800 queue size / msize 0/314
2022-01-08 00:31:45.963748 7fbf18fd7700 3 mds.0.server handle_client_session client_session(request_open) v3 from client.74577766
2022-01-08 00:31:45.963765 7fbf18fd7700 10 mds.0.sessionmap add_session s=0x7fc03f240000 name=client.74577766
2022-01-08 00:31:45.963774 7fbf18fd7700 10 mds.0.sessionmap touch_session s=0x7fc03f240000 name=client.74577766
2022-01-08 00:31:45.963826 7fbf10fc7700 5 mds.0.log _submit_thread 893993519170~385 : ESession client.74577766 10.61.152.103:0/258451310 open cmapv 22300515

2022-01-08 00:31:45.965994 7fbf127ca700 10 mds.0.server _session_logged client.74577766 10.61.152.103:0/258451310 state_seq 1 open 22300515
2022-01-08 00:31:45.966005 7fbf127ca700 10 mds.0.sessionmap touch_session s=0x7fc03f240000 name=client.74577766
2022-01-08 00:31:45.966011 7fbf127ca700 10 mds.0.182824 send_message_client client.74577766 10.61.152.103:0/258451310 client_session(open) v3
2022-01-08 00:31:45.966032 7fbf127ca700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] --> client.7457776610.61.152.103:0/258451310 -- client_session(open) v3 -- 0x7fc0b0971b00 con 0x7fc0717c4800
2022-01-08 00:31:45.966054 7fbf127ca700 2 mds.0.server New client session: addr="10.61.152.103:0/258451310",elapsed=0.002368,throttled=0.000025,status="ACCEPTED",root=“/" 

client log: did not receive request open reply from mds because the connection has been marked down

2022-01-08 00:30:27.949880 7ffa0affd700 0 client.74577766 ms_handle_remote_reset on v2:10.61.152.1:6960/3912678747
2022-01-08 00:30:27.983376 7ffa0affd700 1 client.74577766 reset from mds we were opening; retrying
2022-01-08 00:30:27.983379 7ffa0affd700 5 client.74577766 _closed_mds_session mds.0 seq 0
2022-01-08 00:30:27.983381 7ffa0affd700 1 -- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 msgr2=0x7ff965ea4b70 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down
2022-01-08 00:30:27.983397 7ffa0affd700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff955671f00 0x7ff965ea4b70 crc :-1 s=READY pgs=6812 cs=0 l=0 rx=0 tx=0).stop
2022-01-08 00:30:27.983429 7ffa0affd700 10 client.74577766 remove_session_caps mds.0
2022-01-08 00:30:27.983431 7ffa0affd700 10 client.74577766 kick_requests_closed for mds.0
#send new connect to mds
2022-01-08 00:30:28.033498 7ffa0affd700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff966245b00 0x7ff966088fc0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).connect
2022-01-08 00:30:28.033949 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff966245b00 0x7ff966088fc0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0

mds log:

2022-01-08 00:31:45.984781 7fbf1d7e0700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 msgr2=0x7fc0bfadf700 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 67
2022-01-08 00:31:45.984792 7fbf1d7e0700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 msgr2=0x7fc0bfadf700 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2022-01-08 00:31:45.984795 7fbf1d7e0700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 crc :-1 s=READY pgs=745 cs=0 l=0 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2022-01-08 00:31:45.984837 7fbf1d7e0700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 unknown :-1 s=READY pgs=745 cs=0 l=0 rx=0 tx=0)._fault with nothing to send, going to standby

#received new connect request, trigger reusing old connection
2022-01-08 00:31:46.035100 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> unknown.0 conn(0x7fbf9e888000 0x7fc046fbf600 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept
2022-01-08 00:31:46.035193 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> unknown.0 conn(0x7fbf9e888000 0x7fc046fbf600 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2022-01-08 00:31:46.035299 7fbf1f7e4700 10 mds.xt1 parse_caps: parsing auth_cap_str='allow *'
2022-01-08 00:31:46.035407 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fbf9e888000 0x7fc046fbf600 crc :-1 s=SESSION_ACCEPTING pgs=746 cs=0 l=0 rx=0 tx=0).handle_existing_connection found previous session existing=0x7fc0717c4800, peer must have reseted.
2022-01-08 00:31:46.035418 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 unknown :-1 s=STANDBY pgs=745 cs=0 l=0 rx=0 tx=0).reset_session
#new connection has been marked down
2022-01-08 00:31:46.035428 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fbf9e888000 0x7fc046fbf600 crc :-1 s=SESSION_ACCEPTING pgs=746 cs=0 l=0 rx=0 tx=0).stop
2022-01-08 00:31:46.035434 7fbf18fd7700 5 mds.xt1 ms_handle_remote_reset on 10.61.152.103:0/258451310
2022-01-08 00:31:46.035445 7fbf18fd7700 5 mds.xt1 ms_handle_reset on 10.61.152.103:0/258451310
2022-01-08 00:31:46.035447 7fbf18fd7700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fbf9e888000 msgr2=0x7fc046fbf600 unknown :-1 s=STATE_CLOSED l=0).mark_down
2022-01-08 00:31:46.035454 7fbf18fd7700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fbf9e888000 0x7fc046fbf600 unknown :-1 s=CLOSED pgs=746 cs=0 l=0 rx=0 tx=0).stop
2022-01-08 00:31:46.035460 7fbf18fd7700 10 mds.xt1 existing session 0x7fc03f240000 for client.74577766 10.61.152.103:0/258451310 existing con 0x7fc0717c4800, new/authorizing con 0x7fc0717c4800

2022-01-08 00:31:46.035473 7fbf18fd7700 10 mds.xt1 ms_handle_accept 10.61.152.103:0/258451310 con 0x7fc0717c4800 session 0x7fc03f240000
2022-01-08 00:31:46.035471 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 crc :-1 s=READY pgs=746 cs=0 l=0 rx=0 tx=0).ready entity=client.74577766 client_cookie=c326e3fb4feab038 server_cookie=a9b0e8ac9436429d in_seq=0 out_seq=0

client log:sent request_open to mds again

2022-01-08 00:30:28.034228 7ffa3be87700 1 --2- 10.61.152.103:0/258451310 >> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] conn(0x7ff966245b00 0x7ff966088fc0 crc :-1 s=READY pgs=6813 cs=0 l=0 rx=0 tx=0).ready entity=mds.0 client_cookie=c326e3fb4feab038 server_cookie=a9b0e8ac9436429d in_seq=0 out_seq=0
022-01-08 00:30:28.047087 7ffa0affd700 1 -- 10.61.152.103:0/258451310 --> [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] -- client_session(request_open) v3 -- 0x7ffa12757520 con 0x7ff966245b00
2022-01-08 00:30:28.047130 7ffa0affd700 10 client.74577766.objecter ms_handle_connect 0x7ff955671f00
2022-01-08 00:30:28.047134 7ffa0affd700 10 client.74577766 ms_handle_connect on v2:10.61.152.1:6960/3912678747
2022-01-08 00:30:28.047147 7ffa0affd700 10 client.74577766.objecter ms_handle_connect 0x7ff966245b00
2022-01-08 00:30:28.047148 7ffa0affd700 10 client.74577766 ms_handle_connect on v2:10.61.152.1:6960/3912678747

mds log: dropped the request because its state was open already

2022-01-08 00:31:46.048538 7fbf1f7e4700 1 --2- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] >> 10.61.152.103:0/258451310 client.74577766 conn(0x7fc0717c4800 0x7fc0bfadf700 crc :-1 s=THROTTLE_DONE pgs=746 cs=0 l=0 rx=0 tx=0).handle_message got 314 + 0 + 0 byte message. envelope type=22 src client.74577766 off 0
2022-01-08 00:31:46.048636 7fbf18fd7700 1 -- [v2:10.61.152.1:6960/3912678747,v1:10.61.152.1:6961/3912678747] <== client.74577766 10.61.152.103:0/258451310 1 ==== client_session(request_open) v3 ==== 314+0+0 (crc 0 0 0) 0x7fc0097b4240 con 0x7fc0717c4800 queue size / msize 0/314
2022-01-08 00:31:46.048668 7fbf18fd7700 3 mds.0.server handle_client_session client_session(request_open) v3 from client.74577766
2022-01-08 00:31:46.048673 7fbf18fd7700 10 mds.0.server currently open|opening|stale|killing, dropping this req

client log: failed to send request to mds

2022-01-08 00:31:03.767331 7ff9a7fff700 8 client.74577766 _ll_forget 0x1 1
2022-01-08 00:31:03.798409 7ffa01ffb700 8 client.74577766 _ll_getattr 0x1.head
2022-01-08 00:31:03.798426 7ffa01ffb700 10 client.74577766 _getattr mask pAsLsXsFs issued=0
2022-01-08 00:31:03.805751 7ffa01ffb700 10 client.74577766 did not get mds through better means, so chose random mds 0
2022-01-08 00:31:03.805758 7ffa01ffb700 10 client.74577766 waiting for session to mds.0 to open

#5 Updated by Venky Shankar 10 months ago

  • Status changed from New to Pending Backport

#6 Updated by Backport Bot 10 months ago

  • Copied to Backport #54216: quincy: client: client session state stuck in opening and hang all the time added

#7 Updated by Backport Bot 10 months ago

  • Copied to Backport #54217: pacific: client: client session state stuck in opening and hang all the time added

#8 Updated by Xiubo Li 9 months ago

This fixing will introduce new bug in https://tracker.ceph.com/issues/54461.

#9 Updated by Venky Shankar 7 months ago

  • Pull request ID changed from 44638 to 45307

#10 Updated by Backport Bot 4 months ago

  • Tags set to backport_processed

#11 Updated by Konstantin Shalygin 2 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (backport_processed)

Also available in: Atom PDF