Project

General

Profile

Actions

Bug #49689

closed

osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start

Added by Neha Ojha about 3 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

06609aa10e3319971c8c6251223bafde6e0052349f3e9a5a03edb74f42e3fa2d
070400b67064b04e820f22c4c5c2d50166103b509fcd8578a829d884ab76d18f
0f658d607df04a7f8f53ddb61c044a0ed66469c4148b9c067310ca27088433a9
180423f0717e94fd75d87d7f5dec045b5cbb52edb27205abc5927281a515054a
328b7ead9e7684c4b8487ddfe9edb53e8eb873927dde0036a4324ac081c862dc
36c20a627e03678780b2fe3dc808af4d2b2567a472a4f61d8178c61d7184a0bb
3a50bb9444331ec2b94a68f898f8ab0692ec7500e2e335a31ba6491973c0a17b
48ab57a2c81536fcd4d9c8a7ae6b0d297421a5e916602ccf1c53aded5adf1b8a
7bdc9f715c000afbbc2b75000e4d5623bc2daf351addab915c48f83918c9f25e
80cd84b35982a53d03a84567bc86bf0bf4931d6bd74ada22d61dce2dc61703dd
8eaef1c8be5ac7dcdf10d5a4634f98473e426c2abd231e361a8165b7c5b3c43d
94e5824296c208a0e9854ed27e55e34cf4e42c747119e6bc67aad328b511242a
95d3f1ffec846b1fe432b371d1bb2f07c934d7a56d49a10e7f0d6b989d7e21c7
b8f4aedd70ff355796eba056d35e5df524c19461e54298a94f741f6f11d31d94
bea7cac302e857eee2324d2293899a2a015475abf2766cd9ad62e5d30f204468
c7613c18f4910e58190221e6cca3f422e45d6d916ca5af61b275f26ba8d86209
c9422518b50823c89955b70e6f03330014f3ef1cb421aafc5ea58c0d91d4f179
cb6a35bf8176df5e9719943cc2ecf2ecc568dcfca315f72a40690049bf04a13a
f8cd44714c95a9c477b7509be3ceffb9edc6773399b109719c87bb8679e1305e
fbce5293130f40a9e8ea5f6f5285333e5abf22d7ebfc626e364b89b9f74a130b


Description

2021-03-09T02:06:34.259 INFO:tasks.ceph.osd.1.smithi180.stderr:2021-03-09T02:06:34.258+0000 7fd53172a700 -1 log_channel(cluster) log [ERR] : 3.2s0 past_intervals [218,272) start interval does not contain the required bound [195,272) start
2021-03-09T02:06:34.260 INFO:tasks.ceph.osd.1.smithi180.stderr:2021-03-09T02:06:34.258+0000 7fd53172a700 -1 osd.1 pg_epoch: 272 pg[3.2s0( empty local-lis/les=0/0 n=0 ec=20/20 lis/c=44/44 les/c/f=45/45/0 sis=272) [1,0,3,6]p1(0) r=0 lpr=272 pi=[218,272)/1 crt=0'0 mlcod 0'0 peering mbc={}] 3.2s0 past_intervals [218,272) start interval does not contain the required bound [195,272) start
2021-03-09T02:06:34.263 INFO:tasks.ceph.osd.1.smithi180.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-627-g9e448db9/rpm/el8/BUILD/ceph-16.1.0-627-g9e448db9/src/osd/PeeringState.cc: In function 'void PeeringState::check_past_interval_bounds() const' thread 7fd53172a700 time 2021-03-09T02:06:34.259543+0000
2021-03-09T02:06:34.263 INFO:tasks.ceph.osd.1.smithi180.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-627-g9e448db9/rpm/el8/BUILD/ceph-16.1.0-627-g9e448db9/src/osd/PeeringState.cc: 981: ceph_abort_msg("past_interval start interval mismatch")
2021-03-09T02:06:34.264 INFO:tasks.ceph.osd.1.smithi180.stderr: ceph version 16.1.0-627-g9e448db9 (9e448db9e385343da144613d840349182443aeee) pacific (rc)
2021-03-09T02:06:34.264 INFO:tasks.ceph.osd.1.smithi180.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x55aed1c5a7e0]
2021-03-09T02:06:34.264 INFO:tasks.ceph.osd.1.smithi180.stderr: 2: (PeeringState::check_past_interval_bounds() const+0x6ed) [0x55aed1fdc62d]
2021-03-09T02:06:34.264 INFO:tasks.ceph.osd.1.smithi180.stderr: 3: (PeeringState::GetInfo::GetInfo(boost::statechart::state<PeeringState::GetInfo, PeeringState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0x145) [0x55aed2009d95]
2021-03-09T02:06:34.264 INFO:tasks.ceph.osd.1.smithi180.stderr: 4: (boost::statechart::state<PeeringState::Primary, PeeringState::Started, PeeringState::Peering, (boost::statechart::history_mode)0>::deep_construct(boost::intrusive_ptr<PeeringState::Started> const&, boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>&)+0x146) [0x55aed202efc6]
2021-03-09T02:06:34.265 INFO:tasks.ceph.osd.1.smithi180.stderr: 5: (boost::statechart::simple_state<PeeringState::Start, PeeringState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x15b) [0x55aed203453b]
2021-03-09T02:06:34.265 INFO:tasks.ceph.osd.1.smithi180.stderr: 6: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xa7) [0x55aed201db97]
2021-03-09T02:06:34.265 INFO:tasks.ceph.osd.1.smithi180.stderr: 7: (PeeringState::activate_map(PeeringCtx&)+0x1c2) [0x55aed1fd8032]
2021-03-09T02:06:34.265 INFO:tasks.ceph.osd.1.smithi180.stderr: 8: (PG::handle_activate_map(PeeringCtx&)+0x127) [0x55aed1e12147]
2021-03-09T02:06:34.266 INFO:tasks.ceph.osd.1.smithi180.stderr: 9: (OSD::handle_pg_create_info(std::shared_ptr<OSDMap const> const&, PGCreateInfo const*)+0x711) [0x55aed1d5ae91]
2021-03-09T02:06:34.266 INFO:tasks.ceph.osd.1.smithi180.stderr: 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x33b2) [0x55aed1d80022]
2021-03-09T02:06:34.266 INFO:tasks.ceph.osd.1.smithi180.stderr: 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x55aed23e3fd4]
2021-03-09T02:06:34.266 INFO:tasks.ceph.osd.1.smithi180.stderr: 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55aed23e6c74]
2021-03-09T02:06:34.266 INFO:tasks.ceph.osd.1.smithi180.stderr: 13: /lib64/libpthread.so.0(+0x814a) [0x7fd559e3614a]

/a/yuriw-2021-03-08_21:03:18-rados-wip-yuri5-testing-2021-03-08-1049-pacific-distro-basic-smithi/5947593 - no logs


Files

PG 8.243.xlsx (12.2 KB) PG 8.243.xlsx Shu Yu, 02/10/2022 08:50 AM
osds_log.tar.gz (13.6 KB) osds_log.tar.gz Shu Yu, 02/10/2022 08:50 AM

Related issues 15 (1 open14 closed)

Related to crimson - Bug #55550: crimson: check_past_interval_bounds() assert failureResolvedMatan Breizman

Actions
Has duplicate RADOS - Bug #52212: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #52160: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #52159: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #55549: OSDs crashingResolved

Actions
Has duplicate RADOS - Bug #56289: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #54710: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #54709: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #54708: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #59777: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #59778: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Has duplicate RADOS - Bug #59779: crash: void PeeringState::check_past_interval_bounds() const: abortDuplicate

Actions
Precedes RADOS - Feature #64002: Allow clusters to recover from "past_interval start interval mismatch"NewMatan Breizman

Actions
Copied to RADOS - Backport #61149: pacific: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") startResolvedMatan BreizmanActions
Copied to RADOS - Backport #61150: quincy: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") startResolvedMatan BreizmanActions
Actions

Also available in: Atom PDF