Bug #22231
closedassert(0 == "old msgs despite reconnect_seq feature")
0%
Description
spotted in tests with both simple and async messengers.
- http://pulpito.ceph.com/kchai-2017-11-21_16:17:37-rados-wip-kefu-testing-2017-11-21-1844-distro-basic-mira/1874213/
- http://pulpito.ceph.com/kchai-2017-11-21_16:17:37-rados-wip-kefu-testing-2017-11-21-1844-distro-basic-mira/1874305/
- http://pulpito.ceph.com/kchai-2017-11-21_16:17:37-rados-wip-kefu-testing-2017-11-21-1844-distro-basic-mira/1874376/
per Ilya
Looks like an existing bug -- reproduced on your wip-kefu-testing-2017-11-21-1844 with this PR reverted (603eacea7771).
0> 2017-11-22 17:10:30.934 7fa3d5af0700 -1 /build/ceph-13.0.0-3422-g603eace/src/msg/async/AsyncConnection.cc: In function 'void AsyncConnection::process()' thread 7fa3d5af0700 time 2017-11-22 17:10:30.936672 /build/ceph-13.0.0-3422-g603eace/src/msg/async/AsyncConnection.cc: 705: FAILED assert(0 == "old msgs despite reconnect_seq feature") ceph version 13.0.0-3422-g603eace (603eacea77714074209fe75bf162922b8c839890) mimic (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x7fa3dbb77f0e] 2: (AsyncConnection::process()+0x27d4) [0x7fa3dbe044a4] 3: (EventCenter::process_events(int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x331) [0x7fa3dbc07821] 4: (()+0xb6733e) [0x7fa3dbc0a33e] 5: (()+0xb1a60) [0x7fa3d8f29a60] 6: (()+0x8184) [0x7fa3d95a1184] 7: (clone()+0x6d) [0x7fa3d8690ffd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Kefu Chai over 6 years ago
- Status changed from New to Need More Info
- Assignee set to Ilya Dryomov
Updated by Ilya Dryomov over 6 years ago
- Status changed from Need More Info to 12
- Assignee changed from Ilya Dryomov to Haomai Wang
It was a manual re-run of http://pulpito.ceph.com/kchai-2017-11-21_16:17:37-rados-wip-kefu-testing-2017-11-21-1844-distro-basic-mira/1874213 with msgr/random.yaml swapped to msgr/async.yaml to force the use of async messenger. It took 7 or 8 tries on the original wip-kefu-testing-2017-11-21-1844 (i.e. with https://github.com/ceph/ceph/pull/19044 included) and 3 without it. Might be an issue with msgr-failures/osd-delay.yaml?
Updated by Kefu Chai over 6 years ago
/a/kchai-2017-12-15_11:03:39-rados-wip-kefu-testing-2017-12-15-1528-distro-basic-mira/1967917
Updated by Sage Weil over 6 years ago
- Priority changed from Normal to High
/a/sage-2017-12-18_22:56:18-rados-wip-sage-testing-2017-12-18-1406-distro-basic-smithi/1976735
Updated by Sage Weil over 6 years ago
Seeing a dozen+ of these in a single run. Presumably 5216309c25522e9e4a3c3a03ceb927079de91e9b, which was just merged.