Actions
Bug #8736
closedthrash and scrub combination lead to error
Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
In http://pulpito.ceph.com/loic-2014-07-02_23:05:05-upgrade:firefly-x:stress-split-firefly-testing-basic-vps/338904/ OSD 1 is killed by the thrasher
2014-07-02T21:58:51.862 INFO:teuthology.task.thrashosds.thrasher:Killing osd 1, live_osds are [5, 4, 3, 1, 2, 0]
but the kill fails
2014-07-02T22:27:58.598 ERROR:teuthology.run_tasks:Manager failed: thrashosds ... CommandFailedError: Command failed on vpm070 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 1'
Immediately after that scrub tries to run on osd (although it should probably not because it is not in) and fails
2014-07-02T22:28:05.339 INFO:teuthology.orchestra.run.vpm075:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.1' 2014-07-02T22:28:05.614 INFO:teuthology.orchestra.run.vpm075.stderr:Error EAGAIN: osd.1 is not up
Updated by Loïc Dachary over 9 years ago
2014-08-04T12:37:47.478 INFO:teuthology.orchestra.run.plana89.stderr:Error EAGAIN: osd.5 is not up
Updated by Yuri Weinstein over 9 years ago
- Project changed from teuthology to Ceph
- Assignee changed from Yuri Weinstein to Ian Colle
This needs to be prioritized.
Confirmed, logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-21_11:40:02-upgrade:dumpling-x:stress-split-master-distro-basic-vps/439533/
It's an osd.5 crash, coredump in ceph-osd.5.log.gz
903270073:2014-08-21 19:47:59.612165 7f59303eb700 -1 *** Caught signal (Aborted) ** 903270147- in thread 7f59303eb700 903270171- 903270172- ceph version 0.84-372-gb0aa846 (b0aa846b3f81225a779de00100e15334fb8156b3) 903270247- 1: ceph-osd() [0x9a8a0a] 903270273- 2: (()+0xfcb0) [0x7f5949bd8cb0] 903270306- 3: (gsignal()+0x35) [0x7f59484c34f5] 903270344- 4: (abort()+0x17b) [0x7f59484c6c5b] 903270381- 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f5948e1669d] 903270451- 6: (()+0xb5846) [0x7f5948e14846] 903270485- 7: (()+0xb5873) [0x7f5948e14873] 903270519- 8: (()+0xb596e) [0x7f5948e1496e] 903270553- 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0xa8cf7f] 903270645- 10: (PG::RecoveryState::Stray::react(PG::MLogRec const&)+0x8c8) [0x7881d8] 903270721- 11: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x182) [0x7b6d32] 903271148- 12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x79a7bb] 903271386- 13: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x79ab11] 903271627- 14: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x303) [0x752753] 903271736- 15: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2ce) [0x65720e] 903271856- 16: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x6a9982] 903271972- 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa7c6b6] 903272040- 18: (ThreadPool::WorkThread::entry()+0x10) [0xa7f760] 903272095- 19: (()+0x7e9a) [0x7f5949bd0e9a] 903272129- 20: (clone()+0x6d) [0x7f594858173d]
Updated by Ian Colle over 9 years ago
- Assignee changed from Ian Colle to Yuri Weinstein
Updated by Sage Weil over 9 years ago
- Priority changed from Normal to Urgent
- Source changed from other to Q/A
Updated by Sage Weil over 9 years ago
- Status changed from New to Duplicate
- Assignee deleted (
Yuri Weinstein)
ha, it's the riter bug. #8777
Actions