Project

General

Profile

Actions

Bug #45702

open

PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter->second.need == i->version && miter->second.have == eversion_t()))

Added by Brad Hubbard almost 4 years ago. Updated 4 months ago.

Status:
Fix Under Review
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083350

2020-05-23T00:02:13.848 INFO:tasks.ceph.osd.6.smithi096.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-1826-gc3321b7686f/rpm/el8/BUILD/ceph-16.0.0-1826-gc3321b7686f/src/osd/PGLog.h: In function 'static void PGLog::read_log_and_missing(ObjectStore*, ObjectStore::CollectionHandle&, ghobject_t, const pg_info_t&, PGLog::IndexedLog&, missing_type&, std::ostringstream&, bool, bool*, const DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*, bool) [with missing_type = pg_missing_set<true>; ObjectStore::CollectionHandle = boost::intrusive_ptr<ObjectStore::CollectionImpl>; std::ostringstream = std::__cxx11::basic_ostringstream<char>]' thread 7fd298b2ff00 time 2020-05-23T00:02:13.846636+0000
2020-05-23T00:02:13.848 INFO:tasks.ceph.osd.6.smithi096.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-1826-gc3321b7686f/rpm/el8/BUILD/ceph-16.0.0-1826-gc3321b7686f/src/osd/PGLog.h: 1481: FAILED ceph_assert(miter == missing.get_items().end() || (miter->second.need == i->version && miter->second.have == eversion_t()))
2020-05-23T00:02:13.850 INFO:tasks.ceph.osd.6.smithi096.stderr: ceph version 16.0.0-1826-gc3321b7686f (c3321b7686f181e1bcb805a1fb24baced390ae4c) pacific (dev)
2020-05-23T00:02:13.850 INFO:tasks.ceph.osd.6.smithi096.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x556707bc9e20]
2020-05-23T00:02:13.850 INFO:tasks.ceph.osd.6.smithi096.stderr: 2: (()+0x52103a) [0x556707bca03a]
2020-05-23T00:02:13.850 INFO:tasks.ceph.osd.6.smithi096.stderr: 3: (void PGLog::read_log_and_missing<pg_missing_set<true> >(ObjectStore*, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t, pg_info_t const&, PGLog::IndexedLog&, pg_missing_set<true>&, std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, bool, bool*, DoutPrefixProvider const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, bool)+0x2656) [0x556707d970a6]
2020-05-23T00:02:13.851 INFO:tasks.ceph.osd.6.smithi096.stderr: 4: (PG::read_state(ObjectStore*)+0x1306) [0x556707d7e476]
2020-05-23T00:02:13.851 INFO:tasks.ceph.osd.6.smithi096.stderr: 5: (OSD::load_pgs()+0xa47) [0x556707cc59e7]
2020-05-23T00:02:13.851 INFO:tasks.ceph.osd.6.smithi096.stderr: 6: (OSD::init()+0x270f) [0x556707cf182f]
2020-05-23T00:02:13.851 INFO:tasks.ceph.osd.6.smithi096.stderr: 7: (main()+0x4798) [0x556707c2ef08]
2020-05-23T00:02:13.851 INFO:tasks.ceph.osd.6.smithi096.stderr: 8: (__libc_start_main()+0xf3) [0x7fd29556a873]
2020-05-23T00:02:13.851 INFO:tasks.ceph.osd.6.smithi096.stderr: 9: (_start()+0x2e) [0x556707c6b0ce]
2020-05-23T00:02:13.851 INFO:tasks.ceph.osd.6.smithi096.stderr:*** Caught signal (Aborted) **

Related issues 2 (0 open2 closed)

Related to RADOS - Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second.need == i->versi on && miter->second.have == eversion_t()))Can't reproduceJosh Durgin08/02/2017

Actions
Has duplicate RADOS - Bug #49136: osd/PGLog.h: FAILED ceph_assert(miter == missing.get_items().end() || (miter->second.need == i->version && miter->second.have == eversion_t()))Duplicate

Actions
Actions #1

Updated by Brad Hubbard almost 4 years ago

  • Description updated (diff)
Actions #2

Updated by Sage Weil about 3 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)
Actions #3

Updated by Neha Ojha about 3 years ago

  • Related to Bug #49136: osd/PGLog.h: FAILED ceph_assert(miter == missing.get_items().end() || (miter->second.need == i->version && miter->second.have == eversion_t())) added
Actions #4

Updated by Neha Ojha about 3 years ago

https://tracker.ceph.com/issues/49136 may have a different root cause, so marking is it as related and not duplicate for now.

Actions #5

Updated by Neha Ojha about 3 years ago

  • Related to Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second.need == i->versi on && miter->second.have == eversion_t())) added
Actions #6

Updated by Neha Ojha about 3 years ago

  • Related to deleted (Bug #49136: osd/PGLog.h: FAILED ceph_assert(miter == missing.get_items().end() || (miter->second.need == i->version && miter->second.have == eversion_t())))
Actions #7

Updated by Neha Ojha about 3 years ago

  • Has duplicate Bug #49136: osd/PGLog.h: FAILED ceph_assert(miter == missing.get_items().end() || (miter->second.need == i->version && miter->second.have == eversion_t())) added
Actions #8

Updated by Neha Ojha about 3 years ago

  • Priority changed from Immediate to Urgent
Actions #9

Updated by Neha Ojha about 3 years ago

  • Assignee set to Neha Ojha
  • Priority changed from Urgent to High
Actions #10

Updated by Neha Ojha about 3 years ago

  • Backport set to pacific

/a/yuriw-2021-03-25_20:03:40-rados-wip-yuri8-testing-2021-03-25-1042-pacific-distro-basic-smithi/5999051

Actions #11

Updated by Sridhar Seshasayee almost 3 years ago

Observed the assert on master:
/a/sseshasa-2021-07-14_10:37:09-rados-wip-sseshasa-testing-2021-07-14-1320-distro-basic-smithi/6270005

Actions #12

Updated by Neha Ojha over 2 years ago

/a/yuriw-2021-08-06_16:31:19-rados-wip-yuri-master-8.6.21-distro-basic-smithi/6324561 - no logs

Actions #13

Updated by Neha Ojha over 2 years ago

  • Priority changed from High to Normal
Actions #14

Updated by Laura Flores about 2 years ago

/a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672070

Actions #15

Updated by Laura Flores almost 2 years ago

/a/yuriw-2022-06-23_21:29:45-rados-wip-yuri4-testing-2022-06-22-1415-pacific-distro-default-smithi/6895353

Actions #17

Updated by Radoslaw Zarzynski over 1 year ago

Even if we don't want to deep dive into right now, we should refactor the assertion:

            if (debug_verify_stored_missing) {
              auto miter = missing.get_items().find(i->soid);
              if (i->is_delete()) {
                ceph_assert(miter == missing.get_items().end() ||
                       (miter->second.need == i->version &&
                        miter->second.have == eversion_t()));
              } else {
                ceph_assert(miter != missing.get_items().end());
                ceph_assert(miter->second.need == i->version);
                ceph_assert(miter->second.have == eversion_t());
              }
              checked.insert(i->soid);
            } 
Actions #18

Updated by Radoslaw Zarzynski over 1 year ago

Ronen: @debug_verify_stored_missing a an input parameter with default value. Unfortunately, it doesn't look like a dead one:

void PG::read_state(ObjectStore *store)
{
  // ...
  recovery_state.init_from_disk_state(
    std::move(info_from_disk),
    std::move(past_intervals_from_disk),
    [this, store] (PGLog &pglog) {
      ostringstream oss;
      pglog.read_log_and_missing(
        store,
        ch,
        pgmeta_oid,
        info,
        oss,
        cct->_conf->osd_ignore_stale_divergent_priors,
        cct->_conf->osd_debug_verify_missing_on_start);
Actions #19

Updated by Radoslaw Zarzynski over 1 year ago

  • Assignee changed from Neha Ojha to Nitzan Mordechai

We're poking with read_log_and_missing() pretty recently (the dups issue). Does it ring a bell?

Actions #20

Updated by Nitzan Mordechai over 1 year ago

  • Pull request ID set to 48088
Actions #21

Updated by Nitzan Mordechai over 1 year ago

  • Status changed from New to Fix Under Review
Actions #22

Updated by Laura Flores 4 months ago

/a/lflores-2024-01-10_23:43:40-rados-wip-yuri11-testing-2024-01-10-1124-pacific-distro-default-smithi/7512901

Actions #23

Updated by Laura Flores 4 months ago

  • Translation missing: en.field_tag_list set to test-failure
Actions

Also available in: Atom PDF