Bug #42019: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) - Ceph - Ceph

Actions

Copy link

Bug #42019

closed

osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update)

Added by tao ning over 4 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Tags:

Backport:

luminous,mimic,nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

rados

Pull request ID:

30533

Crash signature (v1):

Crash signature (v2):

Description

/packages-rpms/BUILD/ceph-12.2.7-1457-g2aeea6c/src/osd/PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55de190e48b0]
 2: (PrimaryLogPG::recover_got(hobject_t, eversion_t)+0x5f8) [0x55de18c7d248]
 3: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore
::Transaction*)+0x2fe) [0x55de18c7d61e]
 4: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp const&, PullOp*, std::list<ReplicatedBackend::pull_complete_info,
std::allocator<ReplicatedBackend::pull_complete_info> >*, ObjectStore::Transaction*)+0x779) [0x55de18e189c9]
 5: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x658) [0x55de18e1ae28]
 6: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x214) [0x55de18e21e44]
 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55de18d3b420]
 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6dc) [0x55de18cde05c]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x26d) [0x55de18afa3ad]
 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55de18db3417]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x19b4) [0x55de18b2d314]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55de190ea2f9]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55de190ec270]
 14: (()+0x7e25) [0x7fbb83204e25]
 15: (clone()+0x6d) [0x7fbb822f834d]

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by tao ning over 4 years ago

1. Injection read fault, recover failed object is added to the missing set, log.complete_to == log.end()
2. Before the first object recovery, the new object writes, update info.last_update, because num_missing() != 0, so last_complete will not be updated
3. After the first object recovery, PeeringState::recover_got generates the error: ceph_assert(info.last_complete == info.last_update);

Actions

Copy link

Updated by Neha Ojha over 4 years ago

This sounds pretty much like https://tracker.ceph.com/issues/41816, maybe the other fix needs to be backported all the way to luminous.

Actions

Copy link

Updated by Kefu Chai over 4 years ago

Status changed from New to Resolved

Actions

Copy link

Updated by xie xingguo over 4 years ago

Backport set to luminous,mimic,nautilus

Actions

Copy link

Updated by xie xingguo over 4 years ago

Status changed from Resolved to Pending Backport

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #42131: nautilus: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #42132: luminous: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #42133: mimic: osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update) added

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

The fix touches code that exists only in master:

void PeeringState::force_object_missing(
  const set<pg_shard_t> &peers,
  const hobject_t &soid,
  eversion_t version)
{
  for (auto &&peer : peers) {
    if (peer != primary) {
      peer_missing[peer].add(soid, version, eversion_t(), false);
    } else {
      pg_log.missing_add(soid, version, eversion_t());
      pg_log.reset_complete_to(&info);
      pg_log.set_last_requested(0);
    }
  }

so this doesn't look like a candidate for backport.

Actions

Copy link

#10

Updated by Nathan Cutler over 4 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #42019

osd/PrimaryLogPG: PrimaryLogPG.cc: 12662: FAILED assert(info.last_complete == info.last_update)

Updated by tao ning over 4 years ago

Updated by Neha Ojha over 4 years ago

Updated by Kefu Chai over 4 years ago

Updated by xie xingguo over 4 years ago

Updated by xie xingguo over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago