Project

General

Profile

Bug #13382

osd/PG.cc: 758: FAILED assert(info.history.same_interval_since == same_interval_since)

Added by Sage Weil about 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

 -1655> 2015-10-05 22:36:47.385270 7fa2abfa8700  1 -- 10.214.130.57:6801/20854 <== osd.12 10.214.130.79:6828/13416 2 ==== pg_notify(1.9(2) epoch 109) v5 ==== 857+0+0 (2203112667 0 0) 0x7fa2c59d5c20 con 0x7fa2c61dde40
 -1654> 2015-10-05 22:36:47.385277 7fa2abfa8700 10 osd.1 109 do_waiters -- start
 -1653> 2015-10-05 22:36:47.385279 7fa2abfa8700 10 osd.1 109 do_waiters -- finish
 -1652> 2015-10-05 22:36:47.385280 7fa2abfa8700 20 osd.1 109 _dispatch 0x7fa2c59d5c20 pg_notify(1.9(2) epoch 109) v5
 -1651> 2015-10-05 22:36:47.385286 7fa2abfa8700  5 -- op tracker -- seq: 11, time: 2015-10-05 22:36:47.381874, event: header_read, op: pg_notify(1.9(2) epoch 109)
 -1650> 2015-10-05 22:36:47.385290 7fa2abfa8700  5 -- op tracker -- seq: 11, time: 2015-10-05 22:36:47.381874, event: throttled, op: pg_notify(1.9(2) epoch 109)
 -1649> 2015-10-05 22:36:47.385293 7fa2abfa8700  5 -- op tracker -- seq: 11, time: 2015-10-05 22:36:47.381896, event: all_read, op: pg_notify(1.9(2) epoch 109)
 -1648> 2015-10-05 22:36:47.385296 7fa2abfa8700  5 -- op tracker -- seq: 11, time: 2015-10-05 22:36:47.385276, event: dispatched, op: pg_notify(1.9(2) epoch 109)
 -1647> 2015-10-05 22:36:47.385300 7fa2abfa8700  7 osd.1 109 handle_pg_notify from osd.12
 -1646> 2015-10-05 22:36:47.385302 7fa2abfa8700 15 osd.1 109 require_same_or_newer_map 109 (i am 109) 0x7fa2c59d5c20
 -1645> 2015-10-05 22:36:47.385306 7fa2abfa8700  5 -- op tracker -- seq: 11, time: 2015-10-05 22:36:47.385305, event: started, op: pg_notify(1.9(2) epoch 109)
 -1644> 2015-10-05 22:36:47.385324 7fa2abfa8700 15 osd.1 109 project_pg_history 1.9 from 109 to 109, start ec=4 les/c/f 21/21/0 109/109/109
 -1643> 2015-10-05 22:36:47.385329 7fa2abfa8700 15 osd.1 109 project_pg_history end ec=4 les/c/f 21/21/0 109/109/109
 -1642> 2015-10-05 22:36:47.385377 7fa2abfa8700 20 osd.1 109 _create_lock_pg pgid 1.9
 -1641> 2015-10-05 22:36:47.385381 7fa2abfa8700 10 osd.1 109 _open_lock_pg 1.9
 -1640> 2015-10-05 22:36:47.385385 7fa2abfa8700 10 osd.1 109 _get_pool 1
 -1639> 2015-10-05 22:36:47.385429 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9(unlocked)] enter Initial
 -1638> 2015-10-05 22:36:47.385442 7fa2abfa8700 20 osd.1 pg_epoch: 109 pg[1.9(unlocked)] enter NotTrimming
 -1637> 2015-10-05 22:36:47.385451 7fa2abfa8700 10 osd.1 pg_epoch: 109 pg[1.9( DNE empty local-les=0 n=0 ec=0 les/c/f 0/0/0 0/0/0) [] r=0 lpr=0 crt=0'0 inactive NIBBLEWISE] init role 0 up [1,4] acting [1,4] history ec=4 les/c/f 21/21/0 109/109/109 2 past_intervals
 -1636> 2015-10-05 22:36:47.385464 7fa2abfa8700 20 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=0 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] _on_new_interval resorting object_contexts
 -1635> 2015-10-05 22:36:47.385498 7fa2abfa8700  5 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 4294967295'18446744073709551615, trimmed: 
 -1634> 2015-10-05 22:36:47.385512 7fa2abfa8700  7 osd.1 109 _create_lock_pg pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=0 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE]
 -1633> 2015-10-05 22:36:47.385518 7fa2abfa8700 10 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=0 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] handle_create
 -1632> 2015-10-05 22:36:47.385528 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=0 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial 0.000099 0 0.000000
 -1631> 2015-10-05 22:36:47.385538 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=0 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
 -1630> 2015-10-05 22:36:47.385545 7fa2abfa8700 20 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=0 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] set_last_peering_reset 109
 -1629> 2015-10-05 22:36:47.385551 7fa2abfa8700 10 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] Clearing blocked outgoing recovery messages
 -1628> 2015-10-05 22:36:47.385558 7fa2abfa8700 10 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] Not blocking outgoing recovery messages
 -1627> 2015-10-05 22:36:47.385567 7fa2abfa8700 10 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] update_heartbeat_peers  -> 1,4
 -1626> 2015-10-05 22:36:47.385575 7fa2abfa8700 20 osd.1 109 need_heartbeat_peer_update
 -1625> 2015-10-05 22:36:47.385576 7fa2abfa8700 10 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] take_waiters
 -1624> 2015-10-05 22:36:47.385584 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] exit Reset 0.000045 1 0.000039
 -1623> 2015-10-05 22:36:47.385593 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] enter Started
 -1622> 2015-10-05 22:36:47.385601 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] enter Start
 -1621> 2015-10-05 22:36:47.385607 7fa2abfa8700  1 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] state<Start>: transitioning to Primary
 -1620> 2015-10-05 22:36:47.385616 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] exit Start 0.000014 0 0.000000
 -1619> 2015-10-05 22:36:47.385625 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] enter Started/Primary
 -1618> 2015-10-05 22:36:47.385633 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] enter Started/Primary/Peering
 -1617> 2015-10-05 22:36:47.385640 7fa2abfa8700  5 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 peering NIBBLEWISE] enter Started/Primary/Peering/GetInfo
 -1616> 2015-10-05 22:36:47.385661 7fa2abfa8700 10 osd.1 pg_epoch: 109 pg[1.9( empty local-les=0 n=0 ec=4 les/c/f 21/21/0 109/109/109) [1,4] r=0 lpr=109 pi=99-108/2 crt=0'0 mlcod 0'0 peering NIBBLEWISE] generate_past_intervals over epochs 21-99
  -224> 2015-10-05 22:36:47.463626 7fa2abfa8700 -1 osd/PG.cc: In function 'void PG::generate_past_intervals()' thread 7fa2abfa8700 time 2015-10-05 22:36:47.386050
osd/PG.cc: 758: FAILED assert(info.history.same_interval_since == same_interval_since)

 ceph version 9.0.3-2031-g42a2c34 (42a2c340e1a97fffe4ac0d975948005ea5ebded9)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fa2c16a8c7b]
 2: (PG::generate_past_intervals()+0xbcc) [0x7fa2c12345dc]
 3: (PG::RecoveryState::GetInfo::GetInfo(boost::statechart::state<PG::RecoveryState::GetInfo, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0x11d) [0x7fa2c12585cd]
 4: (boost::statechart::state<PG::RecoveryState::Primary, PG::RecoveryState::Started, PG::RecoveryState::Peering, (boost::statechart::history_mode)0>::deep_construct(boost::intrusive_ptr<PG::RecoveryState::Started> const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>&)+0xfa) [0x7fa2c129488a]
 5: (boost::statechart::detail::safe_reaction_result boost::statechart::simple_state<PG::RecoveryState::Start, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::transit_impl<PG::RecoveryState::Primary, PG::RecoveryState::RecoveryMachine, boost::statechart::detail::no_transition_function>(boost::statechart::detail::no_transition_function const&)+0x70) [0x7fa2c129cee0]
 6: (boost::statechart::simple_state<PG::RecoveryState::Start, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xa2) [0x7fa2c129e502]
 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7fa2c1287adb]
 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0xd8) [0x7fa2c1287c58]
 9: (PG::handle_create(PG::RecoveryCtx*)+0x174) [0x7fa2c1236784]
 10: (OSD::handle_pg_peering_evt(spg_t, pg_info_t const&, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > >&, unsigned int, pg_shard_t, bool, std::shared_ptr<PG::CephPeeringEvt>)+0x119c) [0x7fa2c112ae3c]
 11: (OSD::handle_pg_notify(std::shared_ptr<OpRequest>)+0x612) [0x7fa2c112bf72]
 12: (OSD::dispatch_op(std::shared_ptr<OpRequest>)+0xa8) [0x7fa2c112d068]
 13: (OSD::_dispatch(Message*)+0x20f) [0x7fa2c112e10f]
 14: (OSD::ms_dispatch(Message*)+0x20f) [0x7fa2c112e79f]
 15: (DispatchQueue::entry()+0x63a) [0x7fa2c175efda]
 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fa2c168990d]
 17: (()+0x8182) [0x7fa2bfa5f182]
 18: (clone()+0x6d) [0x7fa2bdda647d]

/a/sage-2015-10-05_14:30:41-upgrade:hammer-x-infernalis---basic-vps/1089801

Related issues

Related to Ceph - Bug #10794: ceph_test_rados: incorrect buffer at pos ... Resolved 02/09/2015
Copied to Ceph - Backport #15207: hammer: osd/PG.cc: 758: FAILED assert(info.history.same_interval_since == same_interval_since) Resolved

History

#2 Updated by Sage Weil about 7 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to hammer

#3 Updated by Loïc Dachary about 7 years ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (hammer)

this fixes a bug introduced after hammer by https://github.com/ceph/ceph/commit/70316541bbb115d9a35954bfba373cf1dc084b7e ( see http://tracker.ceph.com/issues/10794 ) which has not been backported, therefore there is no need for a backport

#4 Updated by David Zafman over 6 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to hammer

#5 Updated by David Zafman over 6 years ago

  • Related to Bug #10794: ceph_test_rados: incorrect buffer at pos ... added

#6 Updated by David Zafman over 6 years ago

  • Related to Backport #14979: hammer: ceph_test_rados: incorrect buffer at pos ... added

#8 Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #15207: hammer: osd/PG.cc: 758: FAILED assert(info.history.same_interval_since == same_interval_since) added

#9 Updated by Nathan Cutler over 6 years ago

@David I created a separate hammer backport issue #15207 which shares a PR with #14979 - this is internal to the backporting team; no action required on your part.

#10 Updated by Nathan Cutler over 6 years ago

  • Related to deleted (Backport #14979: hammer: ceph_test_rados: incorrect buffer at pos ...)

#11 Updated by Loïc Dachary over 6 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF