Bug #57628
osd:PeeringState.cc: FAILED ceph_assert(info.history.same_interval_since != 0)
0%
Description
/a/yuriw-2022-09-09_14:59:25-rados-wip-yuri2-testing-2022-09-06-1007-pacific-distro-default-smithi/7022809
2022-09-09T20:41:28.514 INFO:tasks.ceph.osd.4.smithi134.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10-813-g928e03bd/rpm/el8/BUILD/ceph-16.2.10-813-g928e03bd/src/osd/PeeringState.cc: 649: FAILED ceph_assert(info.history.same_interval_since != 0)
...
2022-09-09T20:41:28.615 INFO:tasks.ceph.osd.4.smithi134.stderr: ceph version 16.2.10-813-g928e03bd (928e03bd0c8ce53d78c1f3dddd6852e2ffd05b7f) pacific (stable)
2022-09-09T20:41:28.615 INFO:tasks.ceph.osd.4.smithi134.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x689824]
2022-09-09T20:41:28.615 INFO:tasks.ceph.osd.4.smithi134.stderr: 2: ceph-osd(+0x581a3e) [0x689a3e]
2022-09-09T20:41:28.616 INFO:tasks.ceph.osd.4.smithi134.stderr: 3: (PeeringState::start_peering_interval(std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> > const&, int, std::vector<int, std::allocator<int> > const&, int, ceph::os::Transaction&)+0x1453) [0xa16fd3]
2022-09-09T20:41:28.616 INFO:tasks.ceph.osd.4.smithi134.stderr: 4: (PeeringState::Reset::react(PeeringState::AdvMap const&)+0x293) [0xa32453]
2022-09-09T20:41:28.616 INFO:tasks.ceph.osd.4.smithi134.stderr: 5: (boost::statechart::simple_state<PeeringState::Reset, PeeringState::PeeringMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xf5) [0xa6eeb5]
2022-09-09T20:41:28.617 INFO:tasks.ceph.osd.4.smithi134.stderr: 6: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_queued_events()+0xa7) [0xa58be7]
2022-09-09T20:41:28.617 INFO:tasks.ceph.osd.4.smithi134.stderr: 7: (PeeringState::advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PeeringCtx&)+0x269) [0xa12ce9]
2022-09-09T20:41:28.617 INFO:tasks.ceph.osd.4.smithi134.stderr: 8: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PeeringCtx&)+0x1e6) [0x8491c6]
2022-09-09T20:41:28.617 INFO:tasks.ceph.osd.4.smithi134.stderr: 9: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PeeringCtx&)+0x303) [0x7bc813]
2022-09-09T20:41:28.618 INFO:tasks.ceph.osd.4.smithi134.stderr: 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa4) [0x7be964]
2022-09-09T20:41:28.618 INFO:tasks.ceph.osd.4.smithi134.stderr: 11: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x9f5586]
2022-09-09T20:41:28.618 INFO:tasks.ceph.osd.4.smithi134.stderr: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x7b0908]
2022-09-09T20:41:28.618 INFO:tasks.ceph.osd.4.smithi134.stderr: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0xe33374]
2022-09-09T20:41:28.619 INFO:tasks.ceph.osd.4.smithi134.stderr: 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0xe36254]
2022-09-09T20:41:28.619 INFO:tasks.ceph.osd.4.smithi134.stderr: 15: /lib64/libpthread.so.0(+0x81ca) [0xc8471ca]
2022-09-09T20:41:28.619 INFO:tasks.ceph.osd.4.smithi134.stderr: 16: clone()
Related issues
History
#1 Updated by Laura Flores 4 months ago
- Related to Bug #39659: FAILED ceph_assert(info.history.same_interval_since != 0) added
#2 Updated by Laura Flores 4 months ago
Might be Tracker #39659, but there aren't any logs anymore, so no way to be sure.
#3 Updated by Laura Flores 4 months ago
Caught by Telemetry, happened twice on one 16.2.7 cluster:
#4 Updated by Laura Flores 4 months ago
- Affected Versions v16.2.10 added
#5 Updated by Laura Flores 4 months ago
Telemetry also caught this on v14.1.1. Copying that link here to provide the full picture:
#6 Updated by Yaarit Hatuka 4 months ago
- Affected Versions v14.0.0, v15.0.0 added
The same issue was reported in telemetry also on version 15.0.0:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?var-sig_v2=43429c06cd8a3e57052f2bcc913b85f2571f3f66774aa9c8a9027be5b8f0f22a&orgId=1
Three different signatures were created due to differences in the sanitized backtraces, and the assert functions.
The differences between the sanitized backtraces and the assert functions of (15.0.0 and 16.2.7) and 14.1.1 are easy to spot.
The only difference between the sanitized backtraces of 15.0.0 and 16.2.7 is a single frame:
15.0.0:
boost::statechart::simple_state<PeeringState::Reset, PeeringState::PeeringMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)>::react_impl(boost::statechart::event_base const&, void const*)
16.2.7:
boost::statechart::simple_state<PeeringState::Reset, PeeringState::PeeringMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)>::react_impl(boost::statechart::event_base const&, void const*)
Updated the affected versions - picked v14.0.0 since v14.1.1 does not exist.
#8 Updated by Matan Breizman 4 months ago
- Related to Bug #45991: PG merge: FAILED ceph_assert(info.history.same_interval_since != 0) added
#9 Updated by Matan Breizman 4 months ago
- Related to Bug #37654: FAILED ceph_assert(info.history.same_interval_since != 0) in PG::start_peering_interval() added