Project

General

Profile

Actions

Bug #36183

closed

[objecter] client socket failure leads to hung connection

Added by Jason Dillaman over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Jason Dillaman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Objecter
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During an rbd-mirror thrash test run, the process failed to shut down cleanly because it was stuck in an librados read operation for over 10 minutes. The OSD logs show that the client attempted to connect but disconnected due to the inject socket failure option. There were no further attempts for the connection to re-connect after the initial failure.

2018-09-25 00:09:14.227 7f82794b1700  1 -- 172.21.15.31:6808/12957 >> - conn(0x23a27480 legacy :6808 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=57 172.21.15.31:33298/0
2018-09-25 00:09:14.227 7f82794b1700  1 -- 172.21.15.31:6808/12957 >> 172.21.15.31:33298/0 conn(0x23a27480 legacy :6808 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 57
2018-09-25 00:09:14.227 7f82794b1700  1 -- 172.21.15.31:6808/12957 >> 172.21.15.31:33298/0 conn(0x23a27480 legacy :6808 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2018-09-25 00:09:14.227 7f82794b1700  1 -- 172.21.15.31:6808/12957 >> - conn(0x23a27480 legacy :6808 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner read peer banner and addr failed

The client log only shows that it injected a socket failure:

2018-09-25 00:09:14.227 7efbd8d99700  0 -- 172.21.15.31:47216/4186053723 >> 172.21.15.31:6808/12957 conn(0x2dfa850 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send injecting socket failure

... but there is no corresponding log from Objecter's "ms_handle_reset" callback. In Objecter, adding a "sleep(1)" between "messenger->connect_to_osd(osdmap->get_addrs(osd));" and "s->con->set_priv(RefCountedPtr{s});" can reproduce the issue if the connection is dropped during the handshake.

http://qa-proxy.ceph.com/teuthology/jdillaman-2018-09-24_18:10:36-rbd-wip-rbd-mirror-distro-basic-smithi/3066290/teuthology.log


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #36295: luminous: [objecter] client socket failure leads to hung connectionResolvedPrashant DActions
Copied to RADOS - Backport #36296: mimic: [objecter] client socket failure leads to hung connectionResolvedPrashant DActions
Actions #1

Updated by Jason Dillaman over 5 years ago

  • Description updated (diff)
Actions #2

Updated by Jason Dillaman over 5 years ago

  • Status changed from In Progress to Fix Under Review
Actions #3

Updated by Jason Dillaman over 5 years ago

  • Subject changed from [async msgr] client socket failure leads to hung connection to [objecter] client socket failure leads to hung connection
Actions #4

Updated by Jason Dillaman over 5 years ago

  • Component(RADOS) Objecter added
Actions #5

Updated by Neha Ojha over 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36295: luminous: [objecter] client socket failure leads to hung connection added
Actions #7

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36296: mimic: [objecter] client socket failure leads to hung connection added
Actions #8

Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF