Support #15551
OSD crash during recovery, osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail)
0%
Description
Cluster went down after repeated power failure. Recovery started as expected, but over time OSDs went down and kept crashing. Resulting state is cca 25% down OSDs and recovery will not continue.
Version is 0.94.6-1trusty. Output of ceph -s in resulting state:
cluster 0881d13e-aecf-4701-970f-bd629b914c86 health HEALTH_WARN 2686 pgs backfill 3289 pgs degraded 346 pgs down 5 pgs incomplete 342 pgs peering 8 pgs recovering 2198 pgs recovery_wait 2 pgs stale 3289 pgs stuck degraded 351 pgs stuck inactive 2 pgs stuck stale 5243 pgs stuck unclean 1084 pgs stuck undersized 1084 pgs undersized 145 requests are blocked > 32 sec recovery 598926/5490181 objects degraded (10.909%) recovery 2127266/5490181 objects misplaced (38.747%) recovery 342/2356721 unfound (0.015%) mds ceph3 is laggy 1 mons down, quorum 1,2 ceph2,ceph3 mon.ceph2 low disk space monmap e3: 3 mons at {ceph1=172.27.1.20:6789/0,ceph2=172.27.1.21:6789/0,ceph3=172.27.1.22:6789/0} election epoch 14642, quorum 1,2 ceph2,ceph3 mdsmap e1225: 1/1/1 up {0=ceph3=up:active(laggy or crashed)} osdmap e71699: 73 osds: 57 up, 57 in; 2727 remapped pgs pgmap v15610460: 40960 pgs, 10 pools, 8969 GB data, 2301 kobjects 16862 GB used, 190 TB / 207 TB avail 598926/5490181 objects degraded (10.909%) 2127266/5490181 objects misplaced (38.747%) 342/2356721 unfound (0.015%) 35717 active+clean 2197 active+recovery_wait+degraded 1603 active+remapped+wait_backfill 1083 active+undersized+degraded+remapped+wait_backfill 301 down+peering 39 down+remapped+peering 7 active+recovering+degraded 4 down 3 incomplete 2 down+incomplete 2 stale+peering 1 active+recovering+undersized+degraded+remapped 1 active+recovery_wait+degraded+remapped
Dump of osd crash log:
root@ceph1:/home/deu# /usr/bin/ceph-osd --cluster=ceph -i 38 -f --debug-osd 40 starting osd.38 at :/0 osd_data /var/lib/ceph/osd/ceph-38 /var/lib/ceph/osd/ceph-38/journal 2016-04-20 22:17:15.415477 7f6dc173b8c0 -1 osd.38 70058 log_to_monitors {default=true} osd/PGLog.cc: In function 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f6d9e204700 time 2016-04-20 22:19:58.297651 osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail) ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xba8b8b] 2: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x76b19b] 3: (PG::rewind_divergent_log(ObjectStore::Transaction&, eversion_t)+0x97) [0x7bc947] 4: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) [0x7df43f] 5: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x216) [0x81afb6] 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x80668b] 7: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7b3e2e] 8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x6a6b80] 9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x6ff778] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xb995ee] 11: (ThreadPool::WorkThread::entry()+0x10) [0xb9a690] 12: (()+0x8182) [0x7f6dc0633182] 13: (clone()+0x6d) [0x7f6dbeb9e47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2016-04-20 22:19:58.301289 7f6d9e204700 -1 osd/PGLog.cc: In function 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f6d9e204700 time 2016-04-20 22:19:58.297651 osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail) ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xba8b8b] 2: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x76b19b] 3: (PG::rewind_divergent_log(ObjectStore::Transaction&, eversion_t)+0x97) [0x7bc947] 4: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) [0x7df43f] 5: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x216) [0x81afb6] 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x80668b] 7: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7b3e2e] 8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x6a6b80] 9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x6ff778] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xb995ee] 11: (ThreadPool::WorkThread::entry()+0x10) [0xb9a690] 12: (()+0x8182) [0x7f6dc0633182] 13: (clone()+0x6d) [0x7f6dbeb9e47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -7506> 2016-04-20 22:19:58.301289 7f6d9e204700 -1 osd/PGLog.cc: In function 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f6d9e204700 time 2016-04-20 22:19:58.297651 osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail) ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xba8b8b] 2: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x76b19b] 3: (PG::rewind_divergent_log(ObjectStore::Transaction&, eversion_t)+0x97) [0x7bc947] 4: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) [0x7df43f] 5: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x216) [0x81afb6] 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x80668b] 7: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7b3e2e] 8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x6a6b80] 9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x6ff778] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xb995ee] 11: (ThreadPool::WorkThread::entry()+0x10) [0xb9a690] 12: (()+0x8182) [0x7f6dc0633182] 13: (clone()+0x6d) [0x7f6dbeb9e47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' *** Caught signal (Aborted) ** in thread 7f6d9e204700 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: /usr/bin/ceph-osd() [0xaaff6a] 2: (()+0x10340) [0x7f6dc063b340] 3: (gsignal()+0x39) [0x7f6dbeadacc9] 4: (abort()+0x148) [0x7f6dbeade0d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f6dbf3e5535] 6: (()+0x5e6d6) [0x7f6dbf3e36d6] 7: (()+0x5e703) [0x7f6dbf3e3703] 8: (()+0x5e922) [0x7f6dbf3e3922] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xba8d78] 10: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x76b19b] 11: (PG::rewind_divergent_log(ObjectStore::Transaction&, eversion_t)+0x97) [0x7bc947] 12: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) [0x7df43f] 13: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x216) [0x81afb6] 14: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x80668b] 15: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7b3e2e] 16: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x6a6b80] 17: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x6ff778] 18: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xb995ee] 19: (ThreadPool::WorkThread::entry()+0x10) [0xb9a690] 20: (()+0x8182) [0x7f6dc0633182] 21: (clone()+0x6d) [0x7f6dbeb9e47d] 2016-04-20 22:19:58.532172 7f6d9e204700 -1 *** Caught signal (Aborted) ** in thread 7f6d9e204700 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: /usr/bin/ceph-osd() [0xaaff6a] 2: (()+0x10340) [0x7f6dc063b340] 3: (gsignal()+0x39) [0x7f6dbeadacc9] 4: (abort()+0x148) [0x7f6dbeade0d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f6dbf3e5535] 6: (()+0x5e6d6) [0x7f6dbf3e36d6] 7: (()+0x5e703) [0x7f6dbf3e3703] 8: (()+0x5e922) [0x7f6dbf3e3922] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xba8d78] 10: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x76b19b] 11: (PG::rewind_divergent_log(ObjectStore::Transaction&, eversion_t)+0x97) [0x7bc947] 12: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) [0x7df43f] 13: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x216) [0x81afb6] 14: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x80668b] 15: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7b3e2e] 16: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x6a6b80] 17: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x6ff778] 18: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xb995ee] 19: (ThreadPool::WorkThread::entry()+0x10) [0xb9a690] 20: (()+0x8182) [0x7f6dc0633182] 21: (clone()+0x6d) [0x7f6dbeb9e47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2016-04-20 22:19:58.532172 7f6d9e204700 -1 *** Caught signal (Aborted) ** in thread 7f6d9e204700 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: /usr/bin/ceph-osd() [0xaaff6a] 2: (()+0x10340) [0x7f6dc063b340] 3: (gsignal()+0x39) [0x7f6dbeadacc9] 4: (abort()+0x148) [0x7f6dbeade0d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f6dbf3e5535] 6: (()+0x5e6d6) [0x7f6dbf3e36d6] 7: (()+0x5e703) [0x7f6dbf3e3703] 8: (()+0x5e922) [0x7f6dbf3e3922] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xba8d78] 10: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x76b19b] 11: (PG::rewind_divergent_log(ObjectStore::Transaction&, eversion_t)+0x97) [0x7bc947] 12: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) [0x7df43f] 13: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x216) [0x81afb6] 14: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x80668b] 15: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ce) [0x7b3e2e] 16: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2c0) [0x6a6b80] 17: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x18) [0x6ff778] 18: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xb995ee] 19: (ThreadPool::WorkThread::entry()+0x10) [0xb9a690] 20: (()+0x8182) [0x7f6dc0633182] 21: (clone()+0x6d) [0x7f6dbeb9e47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Aborted (core dumped)
History
#1 Updated by Zdenek Janda almost 8 years ago
This is most similar to my issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1289313
Please suggest what should i do to bring OSDs back online. Tested osd_find_best_info_ignore_history_les to True and it didnt help.
#2 Updated by Loïc Dachary almost 8 years ago
- Assignee deleted (
Loïc Dachary) - Target version deleted (
v0.94.7)
#3 Updated by Zdenek Janda almost 8 years ago
Can i supply more information ? I still do have the damaged cluster online, so any investigation can be made, can give access with no issue.
#4 Updated by Greg Farnum almost 7 years ago
- Status changed from New to Closed
This sounds like data loss and Hammer is done.
#5 Updated by Greg Farnum almost 7 years ago
- Tracker changed from Bug to Support
#6 Updated by mingyue zhao almost 7 years ago
@Greg Farnum, Hammer has already fix this problem? which pullRrequest?
#7 Updated by mingyue zhao almost 7 years ago
mingyue zhao wrote:
@Greg Farnum, Hammer has already fix this problem? which pull request?
#8 Updated by Greg Farnum almost 7 years ago
No, these issues appear to be data loss in the local FS (which we can't do much about), and we aren't doing maintenance on the Hammer release any more.
#9 Updated by Zdenek Janda almost 7 years ago
Although we never resolved this issue and data was inevidably lost, consider this solved, we upgraded to newer releases and such ocurrence never appeared again. Thanks for looking into this.
#10 Updated by mingyue zhao over 6 years ago
I have met this problem.ceph version 0.94.5
host A B C,each of them has 1mon +5 osds;
1. host C is down for a month
2. cluster only has 2 host A+B,and "ceph -s" show cluster HEALTH_OK;
3. suddenly RAID of host A (all 5 hdd use the RAID) is wrong, then start host C,then restart host A(but A came up take long time ,more than 5 minutes)
4. when osds on host A are out, then assert(newhead >= log.tail)
I found that in this abnormal condition, it will choose the osd in host C(which has a very old epoch data) as the authority, and the osd who has the newest and fullest data would assert finaly goes down.
I've modified code https://github.com/ceph/ceph/pull/16170
@Greg Farnum, @Zdenek Janda can you take a look?