Project

General

Profile

Actions

Bug #1679

closed

assertion failure is_replica()

Added by Sam Lang over 12 years ago. Updated over 12 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

3 boxes, 12 osds per box. 4 osds (9,11,20,24) crashed at the following assertion. This was triggered by first setting the crush map (ceph mon setcrushmap), and then setting the replication factor (ceph osd pool set [meta]data size 3):

osd/PG.cc: In function 'void PG::proc_primary_info(ObjectStore::Transaction&, const PG::Info&)', in thread '0x7f2bf0614700'
osd/PG.cc: 3728: FAILED assert(is_replica())
ceph version 0.37 (a6f3bbb744a6faea95ae48317f0b838edb16a896)
1: (PG::proc_primary_info(ObjectStore::Transaction&, PG::Info const&)+0x678) [0x64db48]
2: (PG::RecoveryState::ReplicaActive::react(PG::RecoveryState::MInfoRec const&)+0x81) [0x64dc11]
3: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list2<boost::statechart::custom_reaction<PG::RecoveryState::MQuery>, boost::statechart::custom_reaction<PG::RecoveryState::MInfoRec> >, boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x106) [0x668f36]
4: (boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x5a) [0x668fea]
5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5f) [0x6561af]
6: (PG::RecoveryState::handle_info(int, PG::Info&, PG::RecoveryCtx*)+0x143) [0x625243]
7: (OSD::handle_pg_info(MOSDPGInfo*)+0x380) [0x55e470]
8: (OSD::_dispatch(Message*)+0x4ab) [0x56d25b]
9: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
10: (SimpleMessenger::dispatch_entry()+0x88b) [0x5e461b]
11: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
12: (()+0x6d8c) [0x7f2bfc8e7d8c]
13: (clone()+0x6d) [0x7f2bfb12904d]
ceph version 0.37 (a6f3bbb744a6faea95ae48317f0b838edb16a896)
1: (PG::proc_primary_info(ObjectStore::Transaction&, PG::Info const&)+0x678) [0x64db48]
2: (PG::RecoveryState::ReplicaActive::react(PG::RecoveryState::MInfoRec const&)+0x81) [0x64dc11]
3: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list2<boost::statechart::custom_reaction<PG::RecoveryState::MQuery>, boost::statechart::custom_reaction<PG::RecoveryState::MInfoRec> >, boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x106) [0x668f36]
4: (boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x5a) [0x668fea]
5: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5f) [0x6561af]
6: (PG::RecoveryState::handle_info(int, PG::Info&, PG::RecoveryCtx*)+0x143) [0x625243]
7: (OSD::handle_pg_info(MOSDPGInfo*)+0x380) [0x55e470]
8: (OSD::_dispatch(Message*)+0x4ab) [0x56d25b]
9: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
10: (SimpleMessenger::dispatch_entry()+0x88b) [0x5e461b]
11: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
12: (()+0x6d8c) [0x7f2bfc8e7d8c]
13: (clone()+0x6d) [0x7f2bfb12904d]
  • Caught signal (Aborted) *
    in thread 0x7f2bf0614700
    ceph version 0.37 (a6f3bbb744a6faea95ae48317f0b838edb16a896)
    1: /usr/bin/ceph-osd() [0x59f6d2]
    2: (()+0xfc60) [0x7f2bfc8f0c60]
    3: (gsignal()+0x35) [0x7f2bfb076d05]
    4: (abort()+0x186) [0x7f2bfb07aab6]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f2bfb92d6dd]
    6: (()+0xb9926) [0x7f2bfb92b926]
    7: (()+0xb9953) [0x7f2bfb92b953]
    8: (()+0xb9a5e) [0x7f2bfb92ba5e]
    9: (ceph::__ceph_assert_fail(char const
    , char const*, int, char const*)+0x396) [0x5a8e16]
    10: (PG::proc_primary_info(ObjectStore::Transaction&, PG::Info const&)+0x678) [0x64db48]
    11: (PG::RecoveryState::ReplicaActive::react(PG::RecoveryState::MInfoRec const&)+0x81) [0x64dc11]
    12: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list2<boost::statechart::custom_reaction<PG::RecoveryState::MQuery>, boost::statechart::custom_reaction<PG::RecoveryState::MInfoRec> >, boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x106) [0x668f36]
    13: (boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x5a) [0x668fea]
    14: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5f) [0x6561af]
    15: (PG::RecoveryState::handle_info(int, PG::Info&, PG::RecoveryCtx*)+0x143) [0x625243]
    16: (OSD::handle_pg_info(MOSDPGInfo*)+0x380) [0x55e470]
    17: (OSD::_dispatch(Message*)+0x4ab) [0x56d25b]
    18: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
    19: (SimpleMessenger::dispatch_entry()+0x88b) [0x5e461b]
    20: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
    21: (()+0x6d8c) [0x7f2bfc8e7d8c]
Actions #1

Updated by Sam Lang over 12 years ago

Upon trying to restart the failed osds, other osds (7) fail:

  • Caught signal (Aborted)
    in thread 0x7fcceb3ab700
    ceph version 0.37 (a6f3bbb744a6faea95ae48317f0b838edb16a896)
    1: /usr/bin/ceph-osd() [0x59f6d2]
    2: (()+0xfc60) [0x7fccf7687c60]
    3: (gsignal()+0x35) [0x7fccf5e0dd05]
    4: (abort()+0x186) [0x7fccf5e11ab6]
    5: (__assert_fail()+0xf5) [0x7fccf5e067c5]
    6: (OSD::get_or_create_pg(PG::Info const&, unsigned int, int, int&, bool, ObjectStore::Transaction
    , C_Contexts**)+0xb92) [0x54f3f2]
    7: (OSD::handle_pg_notify(MOSDPGNotify*)+0x2f1) [0x55fc81]
    8: (OSD::_dispatch(Message*)+0x46b) [0x56d21b]
    9: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
    10: (SimpleMessenger::dispatch_entry()+0x88b) [0x5e461b]
    11: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
    12: (()+0x6d8c) [0x7fccf767ed8c]
    13: (clone()+0x6d) [0x7fccf5ec004d]
Actions #2

Updated by Sage Weil over 12 years ago

  • Description updated (diff)
  • Status changed from New to Can't reproduce

and old codepending new code.

Actions

Also available in: Atom PDF