Project

General

Profile

Actions

Bug #8642

closed

After Upgrade from Emperor to Firefly osd start (seemingly randomly) crashing

Added by Matthias Babisch almost 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I upgraded our ceph cluster from Emperor to Firefly today. After the upgrade I changed "tunables to optimal" and changed the number of pgs on several pools. Then I noticed the cluster having problems. I noticed osds going down. I cannot be certain if all fail with the same error but i saw
several error messages like below. I am running Firefly on Debian and Ubuntu from your repository!

I am restarting the osds every minute now but the cluster is not in a good state. This problem is quite urgent to me. Please help!

fa4f7c93/rb.0.30b1.2ae8944a.00000000434f/head//3)]) v2
6> 2014-06-21 12:17:09.770468 7f90ba03d700 5 - op tracker -- , seq: 4755, time: 2014-06-21 12:17:09.770162, event: throttled, request: MOSDPGPushReply(3.93 82189 [PushReplyOp(fa
4f7c93/rb.0.30b1.2ae8944a.00000000434f/head//3)]) v2
5> 2014-06-21 12:17:09.770479 7f90ba03d700 5 - op tracker -- , seq: 4755, time: 2014-06-21 12:17:09.770287, event: all_read, request: MOSDPGPushReply(3.93 82189 [PushReplyOp(fa4
f7c93/rb.0.30b1.2ae8944a.00000000434f/head//3)]) v2
4> 2014-06-21 12:17:09.770488 7f90ba03d700 5 - op tracker -- , seq: 4755, time: 2014-06-21 12:17:09.770440, event: dispatched, request: MOSDPGPushReply(3.93 82189 [PushReplyOp(f
a4f7c93/rb.0.30b1.2ae8944a.00000000434f/head//3)]) v2
3> 2014-06-21 12:17:09.770499 7f90ba03d700 5 - op tracker -- , seq: 4755, time: 2014-06-21 12:17:09.770498, event: waiting_for_osdmap, request: MOSDPGPushReply(3.93 82189 [PushR
eplyOp(fa4f7c93/rb.0.30b1.2ae8944a.00000000434f/head//3)]) v2
2> 2014-06-21 12:17:09.770603 7f90b2314700 5 - op tracker -- , seq: 4755, time: 2014-06-21 12:17:09.770603, event: reached_pg, request: MOSDPGPushReply(3.93 82189 [PushReplyOp(f
a4f7c93/rb.0.30b1.2ae8944a.00000000434f/head//3)]) v2
1> 2014-06-21 12:17:09.770708 7f90b2314700 5 - op tracker -- , seq: 4755, time: 2014-06-21 12:17:09.770707, event: done, request: MOSDPGPushReply(3.93 82189 [PushReplyOp(fa4f7c9
3/rb.0.30b1.2ae8944a.00000000434f/head//3)]) v2
0> 2014-06-21 12:17:09.780622 7f90b2b15700 -1 ** Caught signal (Aborted) *
in thread 7f90b2b15700

ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: /usr/bin/ceph-osd() [0xaa0e82]
2: (()+0xf030) [0x7f90cdbc7030]
3: (gsignal()+0x35) [0x7f90cc4d2475]
4: (abort()+0x180) [0x7f90cc4d56f0]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f90ccd3589d]
6: (()+0x63996) [0x7f90ccd33996]
7: (()+0x639c3) [0x7f90ccd339c3]
8: (()+0x63bee) [0x7f90ccd33bee]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0xb7948a]
10: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0x12f) [0x8675cf]
11: /usr/bin/ceph-osd() [0x8985c9]
12: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list2<boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function> >, boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0xbf) [0x8c3daf]
13: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list4<boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function> >, boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x57) [0x8c3f37]
14: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<PG::QueryState>, boost::statechart::custom_reaction<PG::AdvMap>, boost::statechart::custom_reaction<PG::NullEvt>, boost::statechart::custom_reaction<PG::FlushedEvt>, boost::statechart::transition<boost::statechart::event_base, PG::RecoveryState::Crashed, boost::statechart::detail::no_context<boost::statechart::event_base>, &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x57) [0x8c3fe7]
15: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x8a67bb]
16: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x19) [0x8a6849]
17: (PG::RecoveryState::handle_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x31) [0x8a68e1]
18: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x338) [0x85d8f8]
19: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x404) [0x76e8f4]
20: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x14) [0x7cb334]
21: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb6b24a]
22: (ThreadPool::WorkThread::entry()+0x10) [0xb6c4a0]
23: (()+0x6b50) [0x7f90cdbbeb50]
24: (clone()+0x6d) [0x7f90cc57c0ed]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions

Also available in: Atom PDF