Project

General

Profile

Bug #36183

Updated by Jason Dillaman 9 months ago

During an rbd-mirror thrash test run, the process failed to shut down cleanly because it was stuck in an librados read operation for over 10 minutes. The OSD logs show that the client attempted to connect but disconnected due to the inject socket failure option. There were no further attempts for the connection to re-connect after the initial failure.

<pre>
2018-09-25 00:09:14.227 7f82794b1700 1 -- 172.21.15.31:6808/12957 >> - conn(0x23a27480 legacy :6808 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=57 172.21.15.31:33298/0
2018-09-25 00:09:14.227 7f82794b1700 1 -- 172.21.15.31:6808/12957 >> 172.21.15.31:33298/0 conn(0x23a27480 legacy :6808 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 57
2018-09-25 00:09:14.227 7f82794b1700 1 -- 172.21.15.31:6808/12957 >> 172.21.15.31:33298/0 conn(0x23a27480 legacy :6808 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2018-09-25 00:09:14.227 7f82794b1700 1 -- 172.21.15.31:6808/12957 >> - conn(0x23a27480 legacy :6808 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner read peer banner and addr failed
</pre>

The client log only shows that it injected a socket failure:

<pre>
2018-09-25 00:09:14.227 7efbd8d99700 0 -- 172.21.15.31:47216/4186053723 >> 172.21.15.31:6808/12957 conn(0x2dfa850 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send injecting socket failure
</pre>

... but there is no corresponding log from Objecter's "ms_handle_reset" callback. In Objecter, adding a "sleep(1)" between "messenger->connect_to_osd(osdmap->get_addrs(osd));" and "s->con->set_priv(RefCountedPtr{s});" can reproduce the issue if the connection is dropped during the handshake. issue.

http://qa-proxy.ceph.com/teuthology/jdillaman-2018-09-24_18:10:36-rbd-wip-rbd-mirror-distro-basic-smithi/3066290/teuthology.log

Back