Project

General

Profile

Actions

Bug #14933

closed

osd/PGLog.cc: 569: FAILED assert(log.head >= olog.tail && olog.head >= log.tail) on revert unfound

Added by Sage Weil about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2016-03-01T01:23:28.170 INFO:tasks.lost_unfound:reverting unfound in 1.8 on osd.2
2016-03-01T01:23:28.170 INFO:teuthology.orchestra.run.smithi038:Running: u'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg 1.8 mark_unfound_lost revert'
2016-03-01T01:23:28.284 INFO:teuthology.orchestra.run.smithi038.stderr:pg has 1 objects unfound and apparently lost, marking
2016-03-01T01:23:28.288 INFO:tasks.ceph.osd.0.smithi038.stderr:osd/PGLog.cc: In function 'void PGLog::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 7f73d7bdd700 time 2016-03-01 09:23:28.281758
2016-03-01T01:23:28.288 INFO:tasks.ceph.osd.0.smithi038.stderr:osd/PGLog.cc: 569: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)
2016-03-01T01:23:28.289 INFO:tasks.ceph.osd.0.smithi038.stderr: ceph version 10.0.3-2454-g16cdc7d (16cdc7dcbb6174d65981a688c36c1b0008b3792c)
2016-03-01T01:23:28.289 INFO:tasks.ceph.osd.0.smithi038.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f73f5df37fb]
2016-03-01T01:23:28.289 INFO:tasks.ceph.osd.0.smithi038.stderr: 2: (PGLog::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x18c3) [0x7f73f5a0e833]
2016-03-01T01:23:28.290 INFO:tasks.ceph.osd.0.smithi038.stderr: 3: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x9c) [0x7f73f58520dc]
2016-03-01T01:23:28.290 INFO:tasks.ceph.osd.0.smithi038.stderr: 4: (PG::RecoveryState::ReplicaActive::react(PG::MLogRec const&)+0x1cb) [0x7f73f585230b]
2016-03-01T01:23:28.290 INFO:tasks.ceph.osd.0.smithi038.stderr: 5: (boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x1d5) [0x7f73f58b5df5]
2016-03-01T01:23:28.290 INFO:tasks.ceph.osd.0.smithi038.stderr: 6: (boost::statechart::simple_state<PG::RecoveryState::RepNotRecovering, PG::RecoveryState::ReplicaActive, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xbc) [0x7f73f58b96bc]
2016-03-01T01:23:28.291 INFO:tasks.ceph.osd.0.smithi038.stderr: 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7f73f589f83b]
2016-03-01T01:23:28.291 INFO:tasks.ceph.osd.0.smithi038.stderr: 8: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1d5) [0x7f73f58693f5]
2016-03-01T01:23:28.291 INFO:tasks.ceph.osd.0.smithi038.stderr: 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x249) [0x7f73f57cf7f9]
2016-03-01T01:23:28.292 INFO:tasks.ceph.osd.0.smithi038.stderr: 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x7f73f5812262]

/a/sage-2016-02-28_10:09:12-rados-wip-sage-testing---basic-smithi/32185
Actions #1

Updated by Samuel Just about 8 years ago

  • Status changed from New to Resolved

I see in the log a rados bench running in parallel with mark_all_unfound_lost calls. The latter eventually call share_pg_log(), which sends an MOSDPGLog message. This seems to mean that we have the new version of ceph-qa-suite running against a version of master without the wip-lost fixes. 16cdc7dcbb6174d65981a688c36c1b0008b3792c is the sha1 of the ceph code, but it's no longer in the upstream github. Marking resolved based on the above.

Actions

Also available in: Atom PDF