Project

General

Profile

Bug #51627

FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soid) || (it_objects != recovery_state.get_pg_log().get_log().objects.end() && it_objects->second->op == pg_log_entry_t::LOST_REVERT))

Added by Kefu Chai over 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

spotted again,

2021-07-11T02:43:55.694+0000 7ffa80f0e700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0
.0-5874-g16eb42a1/rpm/el8/BUILD/ceph-17.0.0-5874-g16eb42a1/src/osd/PrimaryLogPG.cc: In function 'ObjectContextRef PrimaryLogPG::get_object_context(const hobject_t&, bool, const std::map<std::__cxx11::basi
c_string<char>, ceph::buffer::v15_2_0::list, std::less<void> >*)' thread 7ffa80f0e700 time 2021-07-11T02:43:55.690510+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-5874-g16eb42a1/rpm/el8/BUILD/ceph-17.0.0-5874-g16eb42a1/src/osd/PrimaryLogPG.cc: 11784: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soid) || (it_objects != recovery_state.get_pg_log().get_log().objects.end() && it_objects->second->op == pg_log_entry_t::LOST_REVERT))

 ceph version 17.0.0-5874-g16eb42a1 (16eb42a1d8cef5cf008b04b27d51e13dbd6ec495) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55f1f3750606]
 2: ceph-osd(+0x5bf827) [0x55f1f3750827]
 3: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > const*)+0x22f) [0x55f1f39670df]
 4: (PrimaryLogPG::get_adjacent_clones(std::shared_ptr<ObjectContext>, std::shared_ptr<ObjectContext>&, std::shared_ptr<ObjectContext>&)+0xc5) [0x55f1f3968845]
 5: (PrimaryLogPG::inc_refcount_by_set(PrimaryLogPG::OpContext*, object_manifest_t&, OSDOp&)+0xd3) [0x55f1f396c4d3]
 6: (PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0xe634) [0x55f1f39b4234]
 7: (PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x177) [0x55f1f39babd7]
 8: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x31d) [0x55f1f39bccbd]
 9: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2dbb) [0x55f1f39c674b]
 10: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd1c) [0x55f1f39cd93c]
 11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) [0x55f1f3856c99]
 12: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) [0x55f1f3ab9a18]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x55f1f3873788]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x55f1f3f105c4]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55f1f3f11964]
 16: (Thread::_entry_func(void*)+0xd) [0x55f1f3ef768d]
 17: /lib64/libpthread.so.0(+0x814a) [0x7ffaa6d8a14a]
 18: clone()

/a/ksirivad-2021-07-11_01:45:00-rados-wip-pg-autoscaler-overlap-distro-basic-smithi

the branch being tested was based on 0509deb6a895a98e3e582cbb849606bc559b963c, and included a fix in mgr module. see https://github.com/ceph/ceph/pull/42036


Related issues

Related to RADOS - Bug #62167: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soid) || (it_objects != recovery_state.get_pg_log().get_log().objects.end() && it_objects->second->op == pg_log_entry_t::LOST_REVERT)) Fix Under Review
Copied to RADOS - Backport #51952: pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soid) || (it_objects != recovery_state.get_pg_log().get_log().objects.end() && it_objects->second->op == pg_log_entry_t::LOST_REVERT)) Resolved

History

#2 Updated by Kefu Chai over 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Myoungwon Oh
  • Pull request ID set to 42279

#3 Updated by Kamoltat (Junior) Sirivadhna over 2 years ago

spotted again at ksirivad-2021-07-11_01:45:00-rados-wip-pg-autoscaler-overlap-distro-basic-smithi/6262857/

#4 Updated by Neha Ojha over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to pacific

#5 Updated by Backport Bot over 2 years ago

  • Copied to Backport #51952: pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soid) || (it_objects != recovery_state.get_pg_log().get_log().objects.end() && it_objects->second->op == pg_log_entry_t::LOST_REVERT)) added

#6 Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#7 Updated by Laura Flores almost 2 years ago

Happened again. Could this be a new occurrence?
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698453

#8 Updated by Myoungwon Oh almost 2 years ago

The error message looks like similar before, but the cause is difference from the prior case.
Anyway, I posted the fix.

https://github.com/ceph/ceph/pull/45137

#9 Updated by Neha Ojha almost 2 years ago

Myoungwon Oh wrote:

The error message looks like similar before, but the cause is difference from the prior case.
Anyway, I posted the fix.

https://github.com/ceph/ceph/pull/45137

Thanks for looking into it! I think we should open a different tracker issue for this new fix.

#11 Updated by Aishwarya Mathuria almost 2 years ago

Saw the same assert failure here: /a/yuriw-2022-03-31_21:45:19-rados-wip-yuri5-testing-2022-03-31-1158-quincy-distro-default-smithi/6770156

#12 Updated by Laura Flores 7 months ago

  • Related to Bug #62167: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soid) || (it_objects != recovery_state.get_pg_log().get_log().objects.end() && it_objects->second->op == pg_log_entry_t::LOST_REVERT)) added

Also available in: Atom PDF