Project

General

Profile

Bug #41551

kclient: mds session hang in opening state

Added by Erqi Chen over 4 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Client mds session was evicted several times for no response cap revoke, then session hang in opening state.

Client log:
[Fri Aug 9 14:30:30 2019] libceph: mds0 10.48.175.48:6800 socket closed (con state OPEN)
[Fri Aug 9 14:30:30 2019] libceph: mds0 10.48.175.48:6800 connection reset
[Fri Aug 9 14:30:30 2019] libceph: reset on mds0
[Fri Aug 9 14:30:30 2019] ceph: mds0 closed our session
[Fri Aug 9 14:30:30 2019] ceph: mds0 reconnect start
[Fri Aug 9 14:30:30 2019] ceph: mds0 reconnect denied
[Fri Aug 9 14:30:30 2019] libceph: mds0 10.48.175.48:6800 socket closed (con state NEGOTIATING)
[Fri Aug 9 14:31:10 2019] libceph: mds0 10.48.175.48:6800 socket closed (con state OPEN)
[Fri Aug 9 14:31:11 2019] libceph: mds0 10.48.175.48:6800 connection reset
[Fri Aug 9 14:31:11 2019] libceph: reset on mds0
[Fri Aug 9 14:31:11 2019] ceph: mds0 closed our session
[Fri Aug 9 14:31:11 2019] ceph: mds0 reconnect start
[Fri Aug 9 14:31:11 2019] ceph: mds0 reconnect denied
[Fri Aug 9 14:31:11 2019] libceph: mds0 10.48.175.48:6800 socket closed (con state NEGOTIATING)
[Fri Aug 9 14:33:20 2019] libceph: mds0 10.48.175.48:6800 socket closed (con state OPEN)
[Fri Aug 9 14:33:25 2019] libceph: mds0 10.48.175.48:6800 connection reset
[Fri Aug 9 14:33:25 2019] libceph: reset on mds0
[Fri Aug 9 14:33:25 2019] ceph: mds0 closed our session
[Fri Aug 9 14:33:25 2019] ceph: mds0 reconnect start
[Fri Aug 9 14:33:25 2019] ceph: mds0 reconnect denied
[Fri Aug 9 14:33:25 2019] libceph: mds0 10.48.175.48:6800 socket closed (con state OPEN)

mds log:
2019-08-09 14:30:33.235767 7fd343325700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.242.157:0/3323861414 conn(0x56377b8f1000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept we reset (peer sent cseq 1), sending RESETSESSION
2019-08-09 14:30:33.240824 7fd3413ac700 0 mds.0.server handle_client_reconnect evcit all session(1), so close session(client.2645334)
2019-08-09 14:30:33.241027 7fd343325700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.242.157:0/3323861414 conn(0x565c1901a000 :6800 s=STATE_OPEN pgs=25930 cs=1 l=0).fault server, going to standby
2019-08-09 14:30:33.249688 7fd344327700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.242.157:0/3323861414 conn(0x56a7cabd8000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=1 existing_state=STATE_STANDBY
2019-08-09 14:30:33.249698 7fd344327700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.242.157:0/3323861414 conn(0x56a7cabd8000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing
2019-08-09 14:32:47.375737 7fd343325700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.242.157:0/3323861414 conn(0x565fdd4c2800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept we reset (peer sent cseq 2), sending RESETSESSION
2019-08-09 14:32:47.405132 7fd3413ac700 0 mds.0.server handle_client_reconnect evcit all session(1), so close session(client.2645334)
2019-08-09 14:32:47.406168 7fd343325700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.242.157:0/3323861414 conn(0x56aab7b4d000 :6800 s=STATE_OPEN pgs=25971 cs=1 l=0).fault server, going to standby
2019-08-09 14:32:47.421539 7fd343325700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.242.157:0/3323861414 conn(0x56a7cabd8000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=1 existing_state=STATE_STANDBY
2019-08-09 14:32:47.421549 7fd343325700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.242.157:0/3323861414 conn(0x56a7cabd8000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing
2019-08-09 14:33:37.572548 7fd343325700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.153.12:0/1830753615 conn(0x568d211b9800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept we reset (peer sent cseq 2), sending RESETSESSION
2019-08-09 14:33:37.983820 7fd3413ac700 0 mds.0.server handle_client_reconnect evcit all session(1), so close session(client.2360700)
2019-08-09 14:33:37.983903 7fd343325700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.153.12:0/1830753615 conn(0x565cbc8b9800 :6800 s=STATE_OPEN pgs=1124467 cs=1 l=0).fault server, going to standby
2019-08-09 14:33:37.984423 7fd344327700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.153.12:0/1830753615 conn(0x56ab17f1f000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=1 existing_state=STATE_STANDBY
2019-08-09 14:33:37.984434 7fd344327700 0 -- 10.48.175.48:6800/4291268221 >> 10.48.153.12:0/1830753615 conn(0x56ab17f1f000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept peer reset, then tried to connect to us, replacing

  1. cat /sys/kernel/debug/ceph/de0707c1-402e-4e63-8144-4ccd7f2d8204.client2645334/mds_sessions
    global_id 2645334
    name "user directory"
    mds.0 opening

History

#1 Updated by Jeff Layton over 4 years ago

Patch merged into ceph/testing and should make v5.4.

#2 Updated by Jeff Layton over 4 years ago

  • Assignee set to Erqi Chen

Also available in: Atom PDF