Project

General

Profile

Actions

Bug #3650

closed

osd: crash in Reset state -> start_peering_interval -> on_change -> process_event Reset

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Core was generated by `/tmp/cephtest/binary/usr/local/bin/ceph-osd -f -i 2 -c /tmp/cephtest/ceph.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f4ef268cb7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f4ef268cb7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000010cd102 in reraise_fatal (signum=11) at global/signal_handler.cc:58
#2  0x00000000010cd3d5 in handle_fatal_signal (signum=11) at global/signal_handler.cc:104
#3  <signal handler called>
#4  0x0000000000e72512 in boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>::operator() (this=0x7f4ee14e3ae0)
    at /usr/include/boost/statechart/state_machine.hpp:87
#5  0x0000000000e64805 in boost::statechart::null_exception_translator::operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>, boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::exception_event_handler> (this=0x2dd4620, action=...)
    at /usr/include/boost/statechart/null_exception_translator.hpp:33
#6  0x0000000000e508cd in boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::send_event (this=0x2dd45c8, evt=...)
    at /usr/include/boost/statechart/state_machine.hpp:885
#7  0x0000000000e402de in boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event (this=0x2dd45c8, evt=...)
    at /usr/include/boost/statechart/state_machine.hpp:275
#8  0x0000000000e17a38 in ReplicatedPG::on_change (this=0x2dd3000) at osd/ReplicatedPG.cc:6256
#9  0x0000000000fb767e in PG::start_peering_interval (this=0x2dd3000, lastmap=..., newup=..., newacting=...) at osd/PG.cc:4631
#10 0x0000000000fbc3fa in PG::RecoveryState::Reset::react (this=0x2cfd7d0, advmap=...) at osd/PG.cc:5212
#11 0x0000000001013896 in boost::statechart::custom_reaction<PG::AdvMap>::react<PG::RecoveryState::Reset, boost::statechart::event_base, void const*> (stt=..., evt=..., eventType=@0x7f4ee14e42d8: 0x17545e0) at /usr/include/boost/statechart/custom_reaction.hpp:42
#12 0x0000000001012724 in boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list5<boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function> >, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=..., eventType=0x17545e0)
    at /usr/include/boost/statechart/simple_state.hpp:816
#13 0x0000000001011271 in boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react<boost::mpl::list5<boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function> > > (this=0x2cfd7d0, evt=..., eventType=0x17545e0) at /usr/include/boost/statechart/simple_state.hpp:851
#14 0x000000000100ead0 in boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<PG::QueryState>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=..., eventType=0x17545e0) at /usr/include/boost/statechart/simple_state.hpp:820
#15 0x000000000100bbb9 in boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react<boost::mpl::list<boost::statechart::custom_reaction<PG::QueryState>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::ActMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> > (
    this=0x2cfd7d0, evt=..., eventType=0x17545e0) at /usr/include/boost/statechart/simple_state.hpp:851
#16 0x00000000010069bb in boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl (this=0x2cfd7d0, evt=..., eventType=0x17545e0) at /usr/include/boost/statechart/simple_state.hpp:489
#17 0x0000000000e72515 in boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>::operator() (this=0x7f4ee14e4440)
    at /usr/include/boost/statechart/state_machine.hpp:87
#18 0x0000000000fe9035 in boost::statechart::null_exception_translator::operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::exception_event_handler> (this=0x2dd4298, action=...)
    at /usr/include/boost/statechart/null_exception_translator.hpp:33
#19 0x0000000000fdd19b in boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event (this=0x2dd4240, evt=...)
    at /usr/include/boost/statechart/state_machine.hpp:885
#20 0x0000000000fdcff7 in boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events (this=0x2dd4240)
    at /usr/include/boost/statechart/state_machine.hpp:910
#21 0x0000000000fd3fe6 in boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event (this=0x2dd4240, evt=...)
    at /usr/include/boost/statechart/state_machine.hpp:280
#22 0x0000000000fcfd50 in PG::RecoveryState::handle_event (this=0x2dd4240, evt=..., rctx=0x7f4ee14e4930) at osd/PG.h:1665
#23 0x0000000000fbabce in PG::handle_advance_map (this=0x2dd3000, osdmap=..., lastmap=..., newup=..., newacting=..., rctx=0x7f4ee14e4930) at osd/PG.cc:5021
#24 0x0000000000ebd9ec in OSD::advance_pg (this=0x2b42000, osd_epoch=8, pg=0x2dd3000, rctx=0x7f4ee14e4930, new_pgs=0x7f4ee14e4970) at osd/OSD.cc:4011
#25 0x0000000000ed3f41 in OSD::process_peering_events (this=0x2b42000, pgs=...) at osd/OSD.cc:6116
#26 0x0000000000edd2ad in OSD::PeeringWQ::_process (this=0x2b42f48, pgs=...) at osd/OSD.h:726
#27 0x0000000000f3d13a in ThreadPool::BatchWorkQueue<PG>::_void_process (this=0x2b42f48, p=0x2afb120) at ./common/WorkQueue.h:83
#28 0x00000000011a03c0 in ThreadPool::worker (this=0x2b42448, wt=0x2b3b4a0) at common/WorkQueue.cc:113
#29 0x00000000011a200b in ThreadPool::WorkThread::entry (this=0x2b3b4a0) at common/WorkQueue.h:288
#30 0x00000000011998e1 in Thread::_entry_func (arg=0x2b3b4a0) at common/Thread.cc:41
#31 0x00007f4ef2684e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#32 0x00007f4ef080f4bd in klogctl () from /lib/x86_64-linux-gnu/libc.so.6
#33 0x0000000000000000 in ?? ()

job was
ubuntu@teuthology:/a/teuthology-2012-12-18_19:00:04-regression-next-testing-basic/18308$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: ec18aeecd4de479601363849d489668d8f12410c
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 500
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 212f6b56d1269c04621e36b7900032b8a27ef386
  s3tests:
    branch: next
  workunit:
    sha1: 212f6b56d1269c04621e36b7900032b8a27ef386
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - rados/load-gen-big.sh
Actions #1

Updated by Sage Weil over 11 years ago

that line of code is

  snap_trimmer_machine.process_event(Reset());

I'm not sure why it's having trouble with the Reset event from NotTrimming... :/

Actions #2

Updated by Sage Weil over 11 years ago

  • Status changed from 12 to 7
Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from 7 to In Progress
Actions #4

Updated by Sage Weil over 11 years ago

  • Assignee set to Samuel Just
Actions #5

Updated by Sage Weil over 11 years ago

  • Priority changed from Urgent to High
Actions #6

Updated by Samuel Just over 11 years ago

  • Status changed from In Progress to Can't reproduce

Looked into the core dump, can't see how this happened.

Actions

Also available in: Atom PDF