Project

General

Profile

Actions

Bug #15443

closed

msg/async: stopped rx msgs, dup replace race with delay injection

Added by Sage Weil about 8 years ago. Updated about 5 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

last rx is

2016-04-09 14:26:51.357822 7f054c4fb700 20 osd.4 38 should_share_map osd.0 172.21.15.56:6801/2840 38
2016-04-09 14:26:51.358343 7f056fdf2700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058b5e3800 sd=71 :6805 s=STATE_OPEN pgs=28 cs=5 l=0). == tx == 0x7f058e20e000 osd_repop_reply(client.4119.0:10916 12.5 ondisk, result = 0) v1
2016-04-09 14:26:52.302707 7f0578603700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058b5e3800 sd=71 :6805 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=28 cs=5 l=0). == rx == osd.0 seq 357 0x7f058ccc9600 osd_repop(client.4119.0:11326 12.5) v1

and later we see

2016-04-09 14:42:25.942130 7f0578e04700 10 osd.4 51  new session 0x7f058ca330c0 con=0x7f058d8d1800 addr=172.21.15.56:6801/2840
2016-04-09 14:42:25.944237 7f0578e04700  0 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058d8d1800 sd=59 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 6 vs existing csq=5 existing_state=STATE_OPEN
2016-04-09 14:42:25.946455 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058d8d1800 sd=71 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg stop myself to swap existing
2016-04-09 14:42:25.946513 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058d8d1800 sd=71 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0)._stop
2016-04-09 14:57:26.050343 7f0578e04700 10 osd.4 51  new session 0x7f058c3d0560 con=0x7f058c5ec000 addr=172.21.15.56:6801/2840
2016-04-09 14:57:26.052449 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058c5ec000 sd=143 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg existing racing replace or mark_down happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2016-04-09 14:57:26.056813 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058c5ec000 sd=143 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg existing racing replace or mark_down happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2016-04-09 14:57:26.061247 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058c5ec000 sd=143 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg existing racing replace or mark_down happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2016-04-09 14:57:26.065688 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058c5ec000 sd=143 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg existing racing replace or mark_down happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2016-04-09 14:57:26.070311 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058c5ec000 sd=143 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg existing racing replace or mark_down happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2016-04-09 14:57:26.074918 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058c5ec000 sd=143 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg existing racing replace or mark_down happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2016-04-09 14:57:26.079469 7f0578e04700  1 -- 172.21.15.13:6805/10365 >> 172.21.15.56:6801/2840 conn(0x7f058c5ec000 sd=143 :6805 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg existing racing replace or mark_down happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG

/a/sage-2016-04-09_05:29:12-rados-wip-sage-testing-distro-basic-smithi/117990

this is with https://github.com/ceph/ceph/pull/8471

Actions

Also available in: Atom PDF