Project

General

Profile

Actions

Bug #10022

closed

AsyncMessenger: Wrong newly_acked_seq when replacing existing connection

Added by Haomai Wang over 9 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Here the output. (monitor ips are 10.11.1.27,10.11.1.28,10.11.1.29)

  1. ceph w --debug-ms=10/10
    2014-11-04 10:38:16.155461 7fdf4414b700 10 EpollDriver.add_event add event to fd=4 mask=1
    2014-11-04 10:38:16.155474 7fdf4414b700 10 Event create_file_event create event fd=4 mask=1 now mask is 1
    2014-11-04 10:38:16.155626 7fdf4414b700 10 EpollDriver.add_event add event to fd=7 mask=1
    2014-11-04 10:38:16.155630 7fdf4414b700 10 Event create_file_event create event fd=7 mask=1 now mask is 1
    2014-11-04 10:38:16.155774 7fdf4414b700 10 -
    :/0 ready :/0
    2014-11-04 10:38:16.155783 7fdf4414b700 1 Processor -- start start
    2014-11-04 10:38:16.155785 7fdf4414b700 1 -- :/0 start start
    2014-11-04 10:38:16.155841 7fdf419e2700 10 --entry starting
    2014-11-04 10:38:16.155834 7fdf411e1700 10 --entry starting
    2014-11-04 10:38:16.155883 7fdf419e2700 10 Event process_events wait second 30 usec 0
    2014-11-04 10:38:16.155899 7fdf411e1700 10 Event process_events wait second 30 usec 0
    2014-11-04 10:38:16.156711 7fdf4414b700 10 -- :/1009064 create_connect 10.11.1.29:6789/0, creating connection and registering
    2014-11-04 10:38:16.156747 7fdf4414b700 10 -- :/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=-1 :0 s=STATE_NONE pgs=0 cs=0 l=1)._connect 0
    2014-11-04 10:38:16.156761 7fdf4414b700 1 Event wakeup
    2014-11-04 10:38:16.156774 7fdf4414b700 10 -- :/1009064 get_connection mon.2 10.11.1.29:6789/0 new 0x1f56090
    2014-11-04 10:38:16.156793 7fdf4414b700 1 Event wakeup
    2014-11-04 10:38:16.156812 7fdf4414b700 10 -- :/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=-1 :0 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_message
    2014-11-04 10:38:16.157084 7fdf419e2700 10 EpollDriver.add_event add event to fd=9 mask=1
    2014-11-04 10:38:16.157106 7fdf419e2700 10 Event create_file_event create event fd=9 mask=1 now mask is 1
    2014-11-04 10:38:16.157136 7fdf419e2700 10 -- :/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_BANNER pgs=0 cs=0 l=1).handle_write started.
    2014-11-04 10:38:16.157199 7fdf419e2700 10 Event process_events wait second 30 usec 0
    2014-11-04 10:38:16.157206 7fdf4414b700 1 Event wakeup
    2014-11-04 10:38:16.157259 7fdf419e2700 10 -- :/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_BANNER pgs=0 cs=0 l=1).handle_write started.
    2014-11-04 10:38:16.157284 7fdf419e2700 10 EpollDriver.add_event add event to fd=9 mask=3
    2014-11-04 10:38:16.157286 7fdf419e2700 10 Event create_file_event create event fd=9 mask=2 now mask is 3
    2014-11-04 10:38:16.157290 7fdf419e2700 10 Event process_events wait second 30 usec 0
    2014-11-04 10:38:16.157306 7fdf419e2700 10 -- :/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_BANNER pgs=0 cs=0 l=1)._process_connection get banner, ready to send banner
    2014-11-04 10:38:16.157348 7fdf419e2700 10 -- :/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_IDENTIFY_PEER pgs=0 cs=0 l=1)._process_connection connect write banner done: 10.11.1.29:6789/0
    2014-11-04 10:38:16.157376 7fdf419e2700 1 -- 10.11.1.27:0/1009064 learned_addr learned my addr 10.11.1.27:0/1009064
    2014-11-04 10:38:16.157394 7fdf419e2700 10 -- 10.11.1.27:0/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1)._process_connection connect sent my addr 10.11.1.27:0/1009064
    2014-11-04 10:38:16.157415 7fdf419e2700 10 -- 10.11.1.27:0/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1)._process_connection connect sending gseq=1 cseq=0 proto=15
    2014-11-04 10:38:16.157434 7fdf419e2700 10 -- 10.11.1.27:0/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_CONNECT_REPLY pgs=0 cs=0 l=1).handle_write started.
    2014-11-04 10:38:16.157442 7fdf419e2700 10 Event process_events wait second 30 usec 0
    2014-11-04 10:38:16.157446 7fdf419e2700 10 -- 10.11.1.27:0/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_CONNECT_REPLY pgs=0 cs=0 l=1).handle_write started.
    2014-11-04 10:38:16.157451 7fdf419e2700 10 Event process_events wait second 30 usec 0
    2014-11-04 10:38:16.157560 7fdf419e2700 10 -- 10.11.1.27:0/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_replygot CEPH_MSGR_TAG_SEQ, reading acked_seq and writing in_seq
    2014-11-04 10:38:16.157580 7fdf419e2700 2 -- 10.11.1.27:0/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 18446744073709551615 vs out_seq 0
    2014-11-04 10:38:16.157591 7fdf419e2700 2 -- 10.11.1.27:0/1009064 >> 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection discarding previously sent 0 auth(proto 0 34 bytes epoch 0) v1
    msg/async/AsyncConnection.cc: In function 'int AsyncConnection::_process_connection()' thread 7fdf419e2700 time 2014-11-04 10:38:16.157609
    msg/async/AsyncConnection.cc: 1046: FAILED assert(m)
    ceph version 0.87-600-g25ca92d (25ca92d7ff5cc326d77c1c3f687ece0718f02db9)
    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x72) [0x7fdf459d0b52]
    2: (AsyncConnection::_process_connection()+0x40e9) [0x7fdf45ae8ac9]
    3: (AsyncConnection::process()+0xcb) [0x7fdf45ae92ab]
    4: (EventCenter::process_events(int)+0x567) [0x7fdf45b02a27]
    5: (Worker::entry()+0x7f) [0x7fdf45af01cf]
    6: (()+0x6b50) [0x7fdf4a1f2b50]
    7: (clone()+0x6d) [0x7fdf4969c7bd]
    NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    terminate called after throwing an instance of 'ceph::FailedAssertion'
    Abandon
Actions #1

Updated by Haomai Wang over 9 years ago

  • Status changed from 12 to Resolved
Actions #2

Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)
Actions

Also available in: Atom PDF