Actions
Bug #47723
closedcorrupted ceph_msg_connect message
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
2020-09-29T21:42:02.184+0000 7fb6424e8700 5 mds.beacon.c set_want_state: up:replay -> up:reconnect ... 2020-09-29T21:42:02.593+0000 7fb649cf7700 10 mds.c existing session 0x55dc7480d600 for client.4556 v1:172.21.15.116:0/2729232545 existing con 0, new/authorizing con 0x55dc75597800 2020-09-29T21:42:02.593+0000 7fb649cf7700 10 mds.c parse_caps: parsing auth_cap_str='allow' 2020-09-29T21:42:02.593+0000 7fb649cf7700 10 mds.c ms_handle_accept v1:172.21.15.116:0/2729232545 con 0x55dc75597800 session 0x55dc7480d600 2020-09-29T21:42:02.593+0000 7fb649cf7700 10 mds.c session connection 0 -> 0x55dc75597800 ... 2020-09-29T21:42:02.594+0000 7fb649cf7700 1 -- [v2:172.21.15.45:6827/4195693115,v1:172.21.15.45:6829/4195693115] <== client.4556 v1:172.21.15.116:0/2729232545 2 ==== client_caps(flush ino 0x10000001a9b 1 seq 0 tid 7157 caps=pAsLsXsFsc dirty=Fxw wanted=Fc follows 1 size 2312/0 mtime 2005-06-03T16:32:07.000000+0000 tws 1 xattrs(v=18446618842528883768 l=0)) v10 ==== 236+0+0 (unknown 3718027454 0 0) 0x55dc747e1b00 con 0x55dc75597800 ... 2020-09-29T21:42:02.598+0000 7fb64c4fc700 0 -- [v2:172.21.15.45:6827/4195693115,v1:172.21.15.45:6829/4195693115] >> v1:172.21.15.116:0/2729232545 conn(0x55dc75597800 legacy=0x55dc753f5800 unknown :6829 s=STATE_CONNECTION_ESTABLISHED l=0).read_until injecting socket failure ... 2020-09-29T21:42:02.600+0000 7fb64c4fc700 0 -- [v2:172.21.15.45:6827/4195693115,v1:172.21.15.45:6829/4195693115] >> v1:172.21.15.116:0/2729232545 conn(0x55dc75597800 legacy=0x55dc753f5800 unknown :6829 s=STATE_CONNECTION_ESTABLISHED l=0).read_until injecting socket failure ... 2020-09-29T21:42:05.960+0000 7fb64d4fe700 10 -- [v2:172.21.15.45:6827/4195693115,v1:172.21.15.45:6829/4195693115] accept_conn 0x55dc75597800 v1:172.21.15.116:0/2729232545 2020-09-29T21:42:05.960+0000 7fb64d4fe700 10 -- [v2:172.21.15.45:6827/4195693115,v1:172.21.15.45:6829/4195693115] >> v1:172.21.15.116:0/2729232545 conn(0x55dc75597800 legacy=0x55dc753f5800 unknown :6829 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 62 remaining bytes 0 2020-09-29T21:42:05.960+0000 7fb649cf7700 10 mds.c existing session 0x55dc7480d600 for client.4556 v1:172.21.15.116:0/2729232545 existing con 0x55dc75597800, new/authorizing con 0x55dc75597800 2020-09-29T21:42:05.960+0000 7fb649cf7700 10 mds.c parse_caps: parsing auth_cap_str='allow' 2020-09-29T21:42:05.960+0000 7fb649cf7700 10 mds.c ms_handle_accept v1:172.21.15.116:0/2729232545 con 0x55dc75597800 session 0x55dc7480d600 ... 2020-09-29T21:42:15.176+0000 7fb649cf7700 1 -- [v2:172.21.15.45:6827/4195693115,v1:172.21.15.45:6829/4195693115] <== client.4556 v1:172.21.15.116:0/2729232545 2 ==== client_session(request_renewcaps seq 11) ==== 28+0+0 (unknown 2291222804 0 0) 0x55dc74878fc0 con 0x55dc75597800 ... 2020-09-29T21:42:51.263+0000 7fb6484f4700 7 mds.0.server reconnect_tick: last seen 2.29396 seconds ago, extending reconnect interval
From: /ceph/teuthology-archive/pdonnell-2020-09-29_05:26:41-multimds-wip-pdonnell-testing-20200929.022151-distro-basic-smithi/5480231/remote/smithi045/log/ceph-mds.c.log.gz
The problem seems to be that the induced socket failures prevent the reconnect from completing. The final client_reconnect message is never received.
Symptom:
2020-09-29T21:48:04.946 ERROR:tasks.mds_thrash.fs.[cephfs]:exception: Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200929.022151/qa/tasks/mds_thrash.py", line 124, in _run self.do_thrash() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200929.022151/qa/tasks/mds_thrash.py", line 310, in do_thrash status = self.wait_for_stable(rank, gid) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200929.022151/qa/tasks/mds_thrash.py", line 215, in wait_for_stable raise RuntimeError('timeout waiting for cluster to stabilize') RuntimeError: timeout waiting for cluster to stabilize 2020-09-29T21:48:07.907 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.fs.[cephfs] failed
From: /ceph/teuthology-archive/pdonnell-2020-09-29_05:26:41-multimds-wip-pdonnell-testing-20200929.022151-distro-basic-smithi/5480231/teuthology.log
Actions