Project

General

Profile

Bug #42019

osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update)

Added by tao ning 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature:

Description

/packages-rpms/BUILD/ceph-12.2.7-1457-g2aeea6c/src/osd/PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55de190e48b0]
 2: (PrimaryLogPG::recover_got(hobject_t, eversion_t)+0x5f8) [0x55de18c7d248]
 3: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore
::Transaction*)+0x2fe) [0x55de18c7d61e]
 4: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp const&, PullOp*, std::list<ReplicatedBackend::pull_complete_info,
std::allocator<ReplicatedBackend::pull_complete_info> >*, ObjectStore::Transaction*)+0x779) [0x55de18e189c9]
 5: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x658) [0x55de18e1ae28]
 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x214) [0x55de18e21e44]
 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55de18d3b420]
 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6dc) [0x55de18cde05c]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x26d) [0x55de18afa3ad]
 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55de18db3417]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x19b4) [0x55de18b2d314]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55de190ea2f9]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55de190ec270]
 14: (()+0x7e25) [0x7fbb83204e25]
 15: (clone()+0x6d) [0x7fbb822f834d]

Related issues

Copied to Ceph - Backport #42131: nautilus: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) Rejected
Copied to Ceph - Backport #42132: luminous: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) Rejected
Copied to Ceph - Backport #42133: mimic: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) Rejected

History

#1 Updated by tao ning 6 months ago

1. Injection read fault, recover failed object is added to the missing set, log.complete_to == log.end()
2. Before the first object recovery, the new object writes, update info.last_update, because num_missing() != 0, so last_complete will not be updated
3. After the first object recovery, PeeringState::recover_got generates the error: ceph_assert(info.last_complete == info.last_update);

#2 Updated by Neha Ojha 6 months ago

This sounds pretty much like https://tracker.ceph.com/issues/41816, maybe the other fix needs to be backported all the way to luminous.

#3 Updated by Kefu Chai 6 months ago

  • Status changed from New to Resolved

#4 Updated by xie xingguo 6 months ago

  • Backport set to luminous,mimic,nautilus

#5 Updated by xie xingguo 6 months ago

  • Status changed from Resolved to Pending Backport

#6 Updated by Nathan Cutler 6 months ago

  • Copied to Backport #42131: nautilus: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added

#7 Updated by Nathan Cutler 6 months ago

  • Copied to Backport #42132: luminous: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added

#8 Updated by Nathan Cutler 6 months ago

  • Copied to Backport #42133: mimic: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added

#9 Updated by Nathan Cutler 5 months ago

The fix touches code that exists only in master:

void PeeringState::force_object_missing(
  const set<pg_shard_t> &peers,
  const hobject_t &soid,
  eversion_t version)
{
  for (auto &&peer : peers) {
    if (peer != primary) {
      peer_missing[peer].add(soid, version, eversion_t(), false);
    } else {
      pg_log.missing_add(soid, version, eversion_t());
      pg_log.reset_complete_to(&info);
      pg_log.set_last_requested(0);
    }
  }

so this doesn't look like a candidate for backport.

#10 Updated by Nathan Cutler 5 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF