Actions
Bug #37544
closedmds: reconnect of client during thrashing fails
% Done:
0%
Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2018-11-29 11:29:46.056 7f13ebdd7700 1 mds.a-s Updating MDS map to version 84 from mon.2 ... 2018-11-29 11:29:46.056 7f13ebdd7700 1 mds.0.83 handle_mds_map state change up:replay --> up:reconnect ... 2018-11-29 11:29:46.060 7f13ee5dc700 0 -- 172.21.15.98:6817/607685025 >> 172.21.15.189:56294/184203095 conn(0x2cea480 legacy :6817 s=STATE_CONNECTION_ESTABLISHED l=0).read_until injecting socket failure 2018-11-29 11:29:46.060 7f13ee5dc700 1 -- 172.21.15.98:6817/607685025 >> 172.21.15.189:56294/184203095 conn(0x2cea480 legacy :6817 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe 2018-11-29 11:29:46.060 7f13ee5dc700 1 -- 172.21.15.98:6817/607685025 >> 172.21.15.189:56294/184203095 conn(0x2cea480 legacy :6817 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). write connect message reply failed 2018-11-29 11:29:46.060 7f13ebdd7700 5 mds.a-s ms_handle_reset on 172.21.15.189:56294/184203095 2018-11-29 11:29:46.060 7f13ebdd7700 1 -- 172.21.15.98:6817/607685025 >> 172.21.15.189:56294/184203095 conn(0x2cea480 legacy :6817 s=STATE_CLOSED l=0).mark_down 2018-11-29 11:29:46.060 7f13ee5dc700 1 -- 172.21.15.98:6817/607685025 >> - conn(0x391b200 legacy :6817 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=33 172.21.15.189:41588/0 2018-11-29 11:29:46.060 7f13ee5dc700 10 mds.a-s existing session 0x3785c00 for client.4418 172.21.15.189:56294/184203095 existing con 0, new/authorizing con 0x391b200 2018-11-29 11:29:46.060 7f13ee5dc700 10 mds.a-s ms_handle_authentication: parsing auth_cap_str='allow' 2018-11-29 11:29:46.060 7f13ee5dc700 0 -- 172.21.15.98:6817/607685025 >> 172.21.15.189:56294/184203095 conn(0x391b200 legacy :6817 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept we reset (peer sent cseq 1), sending RESETSESSION 2018-11-29 11:29:46.060 7f13ee5dc700 10 mds.a-s existing session 0x3785c00 for client.4418 172.21.15.189:56294/184203095 existing con 0, new/authorizing con 0x391b200 2018-11-29 11:29:46.060 7f13ee5dc700 10 mds.a-s ms_handle_authentication: parsing auth_cap_str='allow' 2018-11-29 11:29:46.060 7f13ebdd7700 10 mds.a-s ms_handle_accept 172.21.15.189:56294/184203095 con 0x391b200 session 0x3785c00 2018-11-29 11:29:46.060 7f13ebdd7700 10 mds.a-s session connection 0 -> 0x391b200 ... 2018-11-29 11:30:32.627 7f13e95d2700 1 mds.0.server reconnect gives up on client.4418 172.21.15.189:56294/184203095
From: /ceph/teuthology-archive/pdonnell-2018-11-29_06:44:45-fs-wip-pdonnell-testing-20181129.042324-distro-basic-smithi/3291698/remote/smithi098/log/ceph-mds.a-s.log.gz
Seems to happen reliably: /ceph/teuthology-archive/pdonnell-2018-12-07_01:03:31-fs-wip-pdonnell-testing-20181129.042324-distro-basic-smithi/3312640/teuthology.log
Socket injection failure is probably related.
Actions