Bug #42019
closedosd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update)
0%
Description
/packages-rpms/BUILD/ceph-12.2.7-1457-g2aeea6c/src/osd/PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55de190e48b0] 2: (PrimaryLogPG::recover_got(hobject_t, eversion_t)+0x5f8) [0x55de18c7d248] 3: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore ::Transaction*)+0x2fe) [0x55de18c7d61e] 4: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp const&, PullOp*, std::list<ReplicatedBackend::pull_complete_info, std::allocator<ReplicatedBackend::pull_complete_info> >*, ObjectStore::Transaction*)+0x779) [0x55de18e189c9] 5: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x658) [0x55de18e1ae28] 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x214) [0x55de18e21e44] 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55de18d3b420] 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6dc) [0x55de18cde05c] 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x26d) [0x55de18afa3ad] 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55de18db3417] 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x19b4) [0x55de18b2d314] 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55de190ea2f9] 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55de190ec270] 14: (()+0x7e25) [0x7fbb83204e25] 15: (clone()+0x6d) [0x7fbb822f834d]
Updated by tao ning over 4 years ago
1. Injection read fault, recover failed object is added to the missing set, log.complete_to == log.end()
2. Before the first object recovery, the new object writes, update info.last_update, because num_missing() != 0, so last_complete will not be updated
3. After the first object recovery, PeeringState::recover_got generates the error: ceph_assert(info.last_complete == info.last_update);
Updated by Neha Ojha over 4 years ago
This sounds pretty much like https://tracker.ceph.com/issues/41816, maybe the other fix needs to be backported all the way to luminous.
Updated by xie xingguo over 4 years ago
- Status changed from Resolved to Pending Backport
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42131: nautilus: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42132: luminous: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42133: mimic: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added
Updated by Nathan Cutler over 4 years ago
The fix touches code that exists only in master:
void PeeringState::force_object_missing( const set<pg_shard_t> &peers, const hobject_t &soid, eversion_t version) { for (auto &&peer : peers) { if (peer != primary) { peer_missing[peer].add(soid, version, eversion_t(), false); } else { pg_log.missing_add(soid, version, eversion_t()); pg_log.reset_complete_to(&info); pg_log.set_last_requested(0); } }
so this doesn't look like a candidate for backport.
Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".