Project

General

Profile

Bug #22231

assert(0 == "old msgs despite reconnect_seq feature")

Added by Kefu Chai over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
11/23/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

spotted in tests with both simple and async messengers.

- http://pulpito.ceph.com/kchai-2017-11-21_16:17:37-rados-wip-kefu-testing-2017-11-21-1844-distro-basic-mira/1874213/
- http://pulpito.ceph.com/kchai-2017-11-21_16:17:37-rados-wip-kefu-testing-2017-11-21-1844-distro-basic-mira/1874305/
- http://pulpito.ceph.com/kchai-2017-11-21_16:17:37-rados-wip-kefu-testing-2017-11-21-1844-distro-basic-mira/1874376/

per Ilya

Looks like an existing bug -- reproduced on your wip-kefu-testing-2017-11-21-1844 with this PR reverted (603eacea7771).

     0> 2017-11-22 17:10:30.934 7fa3d5af0700 -1 /build/ceph-13.0.0-3422-g603eace/src/msg/async/AsyncConnection.cc: In function 'void AsyncConnection::process()' thread 7fa3d5af0700 time 2017-11-22 17:10:30.936672
/build/ceph-13.0.0-3422-g603eace/src/msg/async/AsyncConnection.cc: 705: FAILED assert(0 == "old msgs despite reconnect_seq feature")

 ceph version 13.0.0-3422-g603eace (603eacea77714074209fe75bf162922b8c839890) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x7fa3dbb77f0e]
 2: (AsyncConnection::process()+0x27d4) [0x7fa3dbe044a4]
 3: (EventCenter::process_events(int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x331) [0x7fa3dbc07821]
 4: (()+0xb6733e) [0x7fa3dbc0a33e]
 5: (()+0xb1a60) [0x7fa3d8f29a60]
 6: (()+0x8184) [0x7fa3d95a1184]
 7: (clone()+0x6d) [0x7fa3d8690ffd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

History

#1 Updated by Kefu Chai over 1 year ago

  • Status changed from New to Need More Info
  • Assignee set to Ilya Dryomov

#2 Updated by Ilya Dryomov over 1 year ago

  • Status changed from Need More Info to Verified
  • Assignee changed from Ilya Dryomov to Haomai Wang

It was a manual re-run of http://pulpito.ceph.com/kchai-2017-11-21_16:17:37-rados-wip-kefu-testing-2017-11-21-1844-distro-basic-mira/1874213 with msgr/random.yaml swapped to msgr/async.yaml to force the use of async messenger. It took 7 or 8 tries on the original wip-kefu-testing-2017-11-21-1844 (i.e. with https://github.com/ceph/ceph/pull/19044 included) and 3 without it. Might be an issue with msgr-failures/osd-delay.yaml?

#3 Updated by Kefu Chai over 1 year ago

/a/kchai-2017-12-15_11:03:39-rados-wip-kefu-testing-2017-12-15-1528-distro-basic-mira/1967917

#4 Updated by Sage Weil about 1 year ago

  • Priority changed from Normal to High

/a/sage-2017-12-18_22:56:18-rados-wip-sage-testing-2017-12-18-1406-distro-basic-smithi/1976735

#5 Updated by Sage Weil about 1 year ago

Seeing a dozen+ of these in a single run. Presumably 5216309c25522e9e4a3c3a03ceb927079de91e9b, which was just merged.

#7 Updated by Sage Weil about 1 year ago

  • Status changed from Verified to Resolved

Also available in: Atom PDF