Project

General

Profile

Actions

Bug #18533

closed

two instances of omap_digest mismatch

Added by Dan Mick over 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Twice, I've noticed this sort of error pop on the long-running cluster: one of the metadata objects reports one replica has a different omap_digest:

list-inconsistent-obj 1.3c metadata
{
  "epoch": 771290,
  "inconsistents": [
    {
      "object": {
        "name": "607.00000000",
        "nspace": "",
        "locator": "",
        "snap": "head",
        "version": 8962591
      },
      "errors": [
        "omap_digest_mismatch" 
      ],
      "union_shard_errors": [],
      "selected_object_info": "1:3ed09add:::607.00000000:head(769238'8962591 mds.0.95185:3872723 dirty|omap|data_digest s 0 uv 8962591 dd ffffffff alloc_hint [0 0 0])",
      "shards": [
        {
          "osd": 31,
          "errors": [],
          "size": 0,
          "omap_digest": "0xa99faf1c",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 48,
          "errors": [],
          "size": 0,
          "omap_digest": "0x0b59d114",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 61,
          "errors": [],
          "size": 0,
          "omap_digest": "0xa99faf1c",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 69,
          "errors": [],
          "size": 0,
          "omap_digest": "0xa99faf1c",
          "data_digest": "0xffffffff" 
        }
      ]
    }
  ]
}

Edit: removed note about xattr; not the point here, I was confused


Related issues 2 (0 open2 closed)

Copied to Ceph - Backport #19391: kraken: two instances of omap_digest mismatchResolvedDavid ZafmanActions
Copied to Ceph - Backport #19404: jewel: core: two instances of omap_digest mismatchResolvedDavid ZafmanActions
Actions #1

Updated by Dan Mick over 7 years ago

  • Description updated (diff)
Actions #2

Updated by Brad Hubbard over 7 years ago

Dan, could this be a duplicate of http://tracker.ceph.com/issues/17177 ? What does the deep scrub output look like?

Actions #3

Updated by Dan Mick over 7 years ago

it is claimed that the version we are running (46f4285) should have the fix (73a1b45) for 17177

Actions #4

Updated by Dan Mick over 7 years ago

There's another instance, pg 1.15, object 604.00000000:

rados -c /home/dmick/lrc/ceph.conf list-inconsistent-obj 1.15
{
  "epoch": 769827,
  "inconsistents": [
    {
      "object": {
        "name": "604.00000000",
        "nspace": "",
        "locator": "",
        "snap": "head",
        "version": 9207591
      },
      "errors": [
        "omap_digest_mismatch" 
      ],
      "union_shard_errors": [],
      "selected_object_info": "1:a93a17c2:::604.00000000:head(772361'9207591 mds.0.95185:12666245 dirty|omap|data_digest s 0 uv 9207591 dd ffffffff alloc_hint [0 0 0])",
      "shards": [
        {
          "osd": 33,
          "errors": [],
          "size": 0,
          "omap_digest": "0x93abd7d2",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 62,
          "errors": [],
          "size": 0,
          "omap_digest": "0x93abd7d2",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 77,
          "errors": [],
          "size": 0,
          "omap_digest": "0x53fcb579",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 120,
          "errors": [],
          "size": 0,
          "omap_digest": "0x53fcb579",
          "data_digest": "0xffffffff" 
        }
      ]
    }
  ]

Actions #5

Updated by David Zafman over 7 years ago

We should look and see if a case of 17177 was missed in that fix.

The checksum is based on this set of keys:

$ sudo rados -p metadata listomapkeys 607.00000000
100130cf96b_head
100130cf971_head
100130cf9e0_head
100130cf9ec_head
100130cfcd3_head
100130cfcdd_head
100130cff88_head
100130cffab_head
100130cffb6_head
100130d0ac3_head
10014d73f2e_head
10014d744fb_head
1001ee36f99_head
1001ee36f9a_head
1001ee36f9b_head
1001f5289aa_head
1001f530b2f_head
1001f530b32_head
1001f530d8a_head
1001f530d9b_head
1001f5401da_head
1001f54ecbd_head
1001f5532c8_head
1001f5532c9_head
1001f553353_head
1001f5533aa_head
1001f5533cd_head
1001f5612d3_head
1001f5612d9_head
1001fb0d36f_head
1001fb0d3e1_head
1001fb0d3e3_head
10020346a54_head

Actions #6

Updated by Dan Mick over 7 years ago

pg 1.15, object 604.00000000 has 9251 omap entries (!)

Actions #7

Updated by Dan Mick over 7 years ago

The 1.3c object has disappeared.

Actions #8

Updated by Dan Mick over 7 years ago

The last scrub data disappeared. The state today on 1.15 604.00000000 is:

{
  "epoch": 772952,
  "inconsistents": [
    {
      "object": {
        "name": "604.00000000",
        "nspace": "",
        "locator": "",
        "snap": "head",
        "version": 9313347
      },
      "errors": [
        "omap_digest_mismatch" 
      ],
      "union_shard_errors": [],
      "selected_object_info": "1:a93a17c2:::604.00000000:head(772953'9313347 mds.0.95440:17792072 dirty|omap|data_digest s 0 uv 9313347 dd ffffffff alloc_hint [0 0 0])",
      "shards": [
        {
          "osd": 33,
          "errors": [],
          "size": 0,
          "omap_digest": "0x627e933d",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 62,
          "errors": [],
          "size": 0,
          "omap_digest": "0x627e933d",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 77,
          "errors": [],
          "size": 0,
          "omap_digest": "0x627e933d",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 120,
          "errors": [],
          "size": 0,
          "omap_digest": "0xd9223271",
          "data_digest": "0xffffffff" 
        }
      ]
    }
  ]
}

Primary reports (through rados listomapkeys) 7379 keys

Replica (osd.77) reports 11 keys, which are a strict subset of the 7379:

100130cda86_head
100130d4598_head
1001f05f4fe_head
1001f05f763_head
1001f05f764_head
1001f05f765_head
1001fb02621_head
1001fb02622_head
1001fb02623_head
1001fb069b7_head
1002065dcfd_head

Actions #9

Updated by Dan Mick over 7 years ago

  • Assignee set to David Zafman
Actions #10

Updated by Dan Mick over 7 years ago

I think c66e466d4ed76cd7a063b9b982ba455150ef1f14 was brought up as a possibly-related issue.

Actions #11

Updated by Dan Mick over 7 years ago

Possibly of interest, possibly garbage:

the omapheader for 604.00000000, obtained by rados getomaphdr and dumped with ceph-dencoder, has corruption in it a lot like the corrupted rstats in http://tracker.ceph.com/issues/18532:

{
    "version": 294525375,
    "snap_purged_thru": 0,
    "fragstat": {
        "version": 120,
        "mtime": "2017-01-23 09:59:12.999262",
        "num_files": 18446744073709542630,
        "num_subdirs": 18446744073709551378
    },
    "accounted_fragstat": {
        "version": 120,
        "mtime": "2017-01-23 09:59:12.999262",
        "num_files": 18446744073709543400,
        "num_subdirs": 18446744073709551381
    },
    "rstat": {
        "version": 7399,
        "rbytes": 18446744067936750485,
        "rfiles": 18446744073709542630,
        "rsubdirs": 18446744073709551378,
        "rsnaprealms": 0,
        "rctime": "2017-01-23 09:59:12.999262" 
    },
    "accounted_rstat": {
        "version": 7399,
        "rbytes": 18446744067983294228,
        "rfiles": 18446744073709542853,
        "rsubdirs": 18446744073709551381,
        "rsnaprealms": 0,
        "rctime": "2017-01-23 09:59:12.999262" 
    }
}

603.00000000's omaphdr, using the same methodology, looks saner:

{
    "version": 294624821,
    "snap_purged_thru": 0,
    "fragstat": {
        "version": 119,
        "mtime": "2017-01-23 13:43:49.738195",
        "num_files": 5,
        "num_subdirs": 17
    },
    "accounted_fragstat": {
        "version": 119,
        "mtime": "2017-01-23 13:43:49.738195",
        "num_files": 6,
        "num_subdirs": 17
    },
    "rstat": {
        "version": 7267,
        "rbytes": 13210238,
        "rfiles": 5,
        "rsubdirs": 17,
        "rsnaprealms": 0,
        "rctime": "2017-01-23 13:43:49.738195" 
    },
    "accounted_rstat": {
        "version": 7267,
        "rbytes": 13222526,
        "rfiles": 6,
        "rsubdirs": 17,
        "rsnaprealms": 0,
        "rctime": "2017-01-23 13:43:49.738195" 
    }
}

Actions #12

Updated by Brad Hubbard over 7 years ago

Given std::numeric_limits<int64_t>::max() = 9223372036854775807 18446744067936750485 seems too large a value to me?

Actions #13

Updated by Brad Hubbard over 7 years ago

So these are signed values dumped out as unsigned values.

void nest_info_t::dump(Formatter *f) const {
f->dump_unsigned("version", version);
f->dump_unsigned("rbytes", rbytes);
f->dump_unsigned("rfiles", rfiles);
f->dump_unsigned("rsubdirs", rsubdirs);
f->dump_unsigned("rsnaprealms", rsnaprealms);
f->dump_stream("rctime") << rctime;
}

void JSONFormatter::dump_unsigned(const char *name, uint64_t u) {
print_name(name);
m_ss << u;
}

badone | geordi: { uint64_t sixfour = -238; cout << sixfour << endl;}
geordi | 18446744073709551378

So these look like negative values, some considerably negative as well.

$ echo 18446744067983294228-18446744073709551616|bc -iq
18446744067983294228-18446744073709551616
-5726257388

Actions #14

Updated by Dan Mick about 7 years ago

Yes, they're negative, but the point is the bad rstats are also hugely negative. (and given that they're counts, they ought to be unsigned anyway)

Actions #15

Updated by Dan Mick about 7 years ago

  • Description updated (diff)
Actions #16

Updated by Dan Mick about 7 years ago

OK, I don't know how to repair this damage. Gonna need advice.

Actions #17

Updated by Dan Mick about 7 years ago

  • Priority changed from Normal to High
Actions #18

Updated by Dan Mick about 7 years ago

  • Removed the object from the primary with ceph-objectstore-tool
  • got all the omap keys/vals and the omap header with c-o-t from a replica
  • got all the xattrs on the filestore file by hand with getfattr
  • Touched a file on the primary in the current/ dir to create the object (no such op in c-o-t? or is import that operation?)
  • restored xattrs on the file on the primary with setfattr, which allowed c-o-t to recognize it again
  • restored omap header and key/vals with c-o-t on the primary
  • re-deep-scrubbed the pg, error is gone!

I look forward to the tool for automating this which I hear is coming shortly (https://github.com/ceph/ceph/pull/9203)

Actions #19

Updated by Dan Mick about 7 years ago

Another instance today:

{
  "epoch": 772925,
  "inconsistents": [
    {
      "object": {
        "name": "100011cf577.00000000",
        "nspace": "",
        "locator": "",
        "snap": "head",
        "version": 9314695
      },
      "errors": [
        "omap_digest_mismatch" 
      ],
      "union_shard_errors": [],
      "selected_object_info": "1:a7f0f16e:::100011cf577.00000000:head(773004'9314695 mds.0.95616:2509263 dirty|omap|data_digest s 0 uv 9314695 dd ffffffff alloc_hint [0 0 0])",
      "shards": [
        {
          "osd": 7,
          "errors": [],
          "size": 0,
          "omap_digest": "0x618ae52f",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 47,
          "errors": [],
          "size": 0,
          "omap_digest": "0x618ae52f",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 60,
          "errors": [],
          "size": 0,
          "omap_digest": "0x618ae52f",
          "data_digest": "0xffffffff" 
        },
        {
          "osd": 72,
          "errors": [],
          "size": 0,
          "omap_digest": "0x6ba4c015",
          "data_digest": "0xffffffff" 
        }
      ]
    }
  ]
}

That object is the /teuthology-archive directory, which had 16274 omap keys, and the omap header said it had 16268 dirs + 6 files, which matched. All three replicas were out of sync with the primary. It's not known which version of the omap data is correct yet.

Simultaneously, mds damage was noted:


[{"damage_type":"dir_frag","id":650990821,"ino":1099619790351,"frag":"*"}]

1099619790351 is 0x10006726e0f, and that inode does not appear to be listed in the omap keys for the object. If it is present on the replicas, that might suggest that the primary was updated for a deletion but the replicas were not.

Actions #20

Updated by David Zafman about 7 years ago

Using the keys from osd.7 and comparing with the primary of the running cluster these 22 omap keys are missing from the replicas which have matching omap_digest.

teuthology-2014-12-10_17:15:01-upgrade:dumpling-firefly-x:stress-split-next-distro-basic-multi_head
teuthology-2014-12-10_17:15:01-upgrade:giant-giant-distro-basic-vps_head
teuthology-2014-12-10_17:18:01-upgrade:firefly-x-next-distro-basic-vps_head
teuthology-2014-12-10_17:25:02-upgrade:dumpling-firefly-x:stress-split-next-distro-basic-vps_head
teuthology-2014-12-10_18:10:03-upgrade:dumpling-firefly-x:parallel-giant-distro-basic-multi_head
teuthology-2014-12-10_18:13:01-upgrade:firefly-x-giant-distro-basic-multi_head
teuthology-2014-12-10_18:15:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi_head
teuthology-2014-12-10_19:13:03-upgrade:dumpling-x-firefly-distro-basic-vps_head
teuthology-2014-12-10_23:00:03-rbd-master-testing-basic-multi_head
teuthology-2014-12-10_23:02:01-rgw-master-testing-basic-multi_head
teuthology-2014-12-10_23:04:05-fs-master-testing-basic-multi_head
teuthology-2014-12-10_23:06:01-krbd-master-testing-basic-multi_head
teuthology-2014-12-10_23:08:01-kcephfs-master-testing-basic-multi_head
teuthology-2014-12-10_23:10:02-knfs-master-testing-basic-multi_head
teuthology-2014-12-10_23:12:01-hadoop-master-testing-basic-multi_head
teuthology-2014-12-10_23:14:01-samba-master-testing-basic-multi_head
teuthology-2014-12-10_23:16:01-rest-master-testing-basic-multi_head
teuthology-2014-12-10_23:18:01-multimds-master-testing-basic-multi_head
teuthology-2014-12-10_23:20:02-multi-version-giant-distro-basic-multi_head
teuthology-2014-12-10_23:20:02-multi-version-master-distro-basic-multi_head
teuthology-2014-12-11_01:10:03-ceph-deploy-firefly-distro-basic-multi_head
teuthology-2014-12-11_02:35:03-smoke-master-distro-basic-multi_head

Actions #21

Updated by Samuel Just about 7 years ago

Whatever happened, happened in the last few days.

samuelj@mira049:~$ ( for i in {7..1}; do sudo zcat /var/log/ceph/ceph.log.$i.gz; done; sudo cat /var/log/ceph/ceph.log ) | grep ' 1.25 deep-scrub '
2017-01-28 06:19:18.117347 osd.72 172.21.4.140:6812/1162 1582 : cluster [INF] 1.25 deep-scrub starts
2017-01-28 06:25:03.841868 osd.72 172.21.4.140:6812/1162 1583 : cluster [INF] 1.25 deep-scrub ok
2017-02-01 10:17:43.639140 osd.72 172.21.4.140:6812/1162 2466 : cluster [INF] 1.25 deep-scrub starts
2017-02-01 10:22:56.239724 osd.72 172.21.4.140:6812/1162 2467 : cluster [ERR] 1.25 deep-scrub 1 errors

Actions #22

Updated by Samuel Just about 7 years ago

samuelj@mira049:~$ ( for i in {7..1}; do sudo zcat /var/log/ceph/ceph.log.$i.gz; done; sudo cat /var/log/ceph/ceph.log ) | grep ' osd\.72 \| osd\.7 \| osd\.60 \| osd\.47 ' | grep -v scrub | grep -v 'slow request'
2017-02-01 22:35:22.762750 mon.0 172.21.4.136:6789/0 1840408 : cluster [INF] osd.7 marked itself down
2017-02-01 22:36:18.409617 mon.0 172.21.4.136:6789/0 1840471 : cluster [INF] osd.7 172.21.5.114:6820/29826 boot
2017-02-01 22:37:28.508304 mon.0 172.21.4.136:6789/0 1840550 : cluster [INF] osd.7 marked itself down
2017-02-01 22:57:08.729956 mon.0 172.21.4.136:6789/0 1841715 : cluster [INF] osd.7 172.21.5.114:6820/30583 boot

Except for today (presumably for the C-O-T checks), none of those osds went down.

Actions #23

Updated by Samuel Just about 7 years ago

I suggest grabbing a copy of the leveldb instances from primary and a replica and examining the actual keys in the store, perhaps that will yield some kind of smoking gun.

Actions #24

Updated by Samuel Just about 7 years ago

ubuntu@mira049:~$ ( for i in {7..1}; do sudo zcat /var/log/ceph/ceph.log.$i.gz; done; sudo cat /var/log/ceph/ceph.log ) | grep ' 1\.25 '
2017-01-26 08:33:59.919704 osd.72 172.21.4.140:6812/1162 1311 : cluster [INF] 1.25 scrub starts
2017-01-26 08:37:09.218916 osd.72 172.21.4.140:6812/1162 1312 : cluster [INF] 1.25 scrub ok
2017-01-28 06:19:18.117347 osd.72 172.21.4.140:6812/1162 1582 : cluster [INF] 1.25 deep-scrub starts
2017-01-28 06:25:03.841868 osd.72 172.21.4.140:6812/1162 1583 : cluster [INF] 1.25 deep-scrub ok
2017-01-29 10:54:29.025508 osd.72 172.21.4.140:6812/1162 1850 : cluster [INF] 1.25 scrub starts
2017-01-29 10:58:47.110057 osd.72 172.21.4.140:6812/1162 1851 : cluster [INF] 1.25 scrub ok
2017-01-31 00:13:49.329845 osd.72 172.21.4.140:6812/1162 2139 : cluster [INF] 1.25 scrub starts
2017-01-31 00:17:16.258921 osd.72 172.21.4.140:6812/1162 2140 : cluster [INF] 1.25 scrub ok
2017-02-01 10:17:43.639140 osd.72 172.21.4.140:6812/1162 2466 : cluster [INF] 1.25 deep-scrub starts
2017-02-01 10:22:56.239724 osd.72 172.21.4.140:6812/1162 2467 : cluster [ERR] 1.25 deep-scrub 1 errors

1.25 doesn't seem to have backfilled either.

Actions #25

Updated by Samuel Just about 7 years ago

I have copied the omap dirs for osds 72 (mira019:~samuelj/omap-osd-72), 7 (mira049:~samuelj/omap-osd-7), and 60 (mira120:~samuelj/omap-osd-60).

Actions #26

Updated by Josh Durgin about 7 years ago

Here's the output from a deep-scrub on 2/7:

{
    "epoch": 780872,
    "inconsistents": [
        {
            "object": {
                "name": "100011cf577.00000000",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 9408654
            },
            "errors": [
                "omap_digest_mismatch" 
            ],
            "union_shard_errors": [],
            "selected_object_info": "1:a7f0f16e:::100011cf577.00000000:head(780877'9408654 mds.0.96709:7255604 dirty|omap|data_digest s 0 uv 9408654 dd ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 7,
                    "errors": [],
                    "size": 0,
                    "omap_digest": "0xe1c0a6ac",
                    "data_digest": "0xffffffff" 
                },
                {
                    "osd": 47,
                    "errors": [],
                    "size": 0,
                    "omap_digest": "0xf463e691",
                    "data_digest": "0xffffffff" 
                },
                {
                    "osd": 60,
                    "errors": [],
                    "size": 0,
                    "omap_digest": "0xf463e691",
                    "data_digest": "0xffffffff" 
                },
                {
                    "osd": 72,
                    "errors": [],
                    "size": 0,
                    "omap_digest": "0xe1c0a6ac",
                    "data_digest": "0xffffffff" 
                }
            ]
        }
    ]
}

Inspecting the leveldbs furthur, we've found an invariant violation: the complete_region on the nodes with extra entries has overlapping ranges (the ranges are stored as [start, end) key/value pairs.

Actions #27

Updated by David Zafman about 7 years ago

Corrupt complete mapping found on pg 1.25 primary osd.72 for oid 100011cf577.00000000:

http://pastebin.com/19W78B6U

Actions #28

Updated by Samuel Just about 7 years ago

<davidzlap> sjust: 100011cf577.00000000
<davidzlap> sjust: I meant http://pastebin.com/19W78B6U
<sjust> davidzlap: can you characterize the overlaps?
  • batman has quit (Remote host closed the connection)
    <sjust> how many are partial overlaps -- like [1, 5) with [4, 10) -- vs contains -- [1, 10) with [5, 7) -- ?
    <sjust> do our problem keys fall exclusively into one of these complete regions/
    <sjust> or different ones
    <sjust> if so, what do they have in common?
    <sjust> joshd davidzlap: good news, I just pushed a unit test which causes an incorrect result with a point query
    <sjust> annoyingly, doesn't work for an interator
    <sjust> that is, the iterator returns the right value in this case
    <sjust> full contains trip up a point query, but not an interator, trying to find a case which trips up the iterator logic
    <davidzlap> sjust: I don't think matters if the complete mapping is corrupt
    <sjust> davidzlap: just found one case which does let us turn a corrupt complete mapping into an incorrect point query result
    <sjust> still trying to find one which would trip up an iterator scan
  • mattbenjamin1 (~) has joined
  • bassam () has joined
  • drk_ () has joined
  • bene2 (~) has joined
  • davidzlap has quit (Quit: Leaving.)
    <sjust> davidz: and I just pushed a comment to my wip-18533 explaining how a partial overlap will trip up an iterator scan
    <sjust> now we just need a way to engineer a partial overlap
  • bene3 has quit (Ping timeout: 480 seconds)
  • rendar has quit (Ping timeout: 480 seconds)
    <sjust> davidz: my mechanism requires that if there is a pair of complete regions like [a, e) and [c, f), e could be erroneously returned
    <sjust> davidz: so do our phantom keys show up as the end of any complete regions?
    <sjust> particularly as the end of a complete region with a subsequent overlapping one?
Actions #30

Updated by Samuel Just about 7 years ago

wip-18533 above now has a unit test which causes the iterator to return a deleted value.

Actions #31

Updated by Samuel Just about 7 years ago

I'm pretty comfortable pinning the cluster trouble on that one, assuming the extra keys and the overlapping complete values we have match.

David, can you confirm that for each of the extra keys, there is a pair of complete entries [a, <key>) and [c, f) where <key> is the erroneously present key and c < <key> and f > <key> ?

Actions #32

Updated by Samuel Just about 7 years ago

David: Can you add the list of keys which are present on that node but shouldn't be?

Actions #33

Updated by Samuel Just about 7 years ago

If the entries David added a few days ago are the right ones, then the above bug doesn't explain what's happening in the cluster.

Actions #34

Updated by Samuel Just about 7 years ago

Nevermind, the bug can produce a more general set of errors than I had realized. See the more recent updates to the TestIterateBug18533 unit test in my branch. Still need to come up with a modification for the fuzzer that would let it produce this kind of error.

Actions #35

Updated by David Zafman about 7 years ago

This is the result of one of Sam's now failing testing with my complete checking code which also outputs the complete mapping when the error is found.

Bad complete for #-1:a8ba2560:::foo2:head#
Complete mapping:
0000000013 > 0000000098
0000000015 -> 0000000056
ceph_test_object_map: /home/dzafman/ceph/src/test/ObjectMap/test_object_map.cc:630: virtual void ObjectMapTest::TearDown(): Assertion `db
>check(std::cerr) == 0' failed.

-------
Answer to the question of whether any of the extra keys are the end of a complete mapping entry is NO. None of the keys below are in the complete mapping (http://pastebin.com/19W78B6U).

teuthology-2014-12-10_17:15:01-upgrade:dumpling-firefly-x:stress-split-next-distro-basic-multi_head
teuthology-2014-12-10_17:15:01-upgrade:giant-giant-distro-basic-vps_head
teuthology-2014-12-10_17:18:01-upgrade:firefly-x-next-distro-basic-vps_head
teuthology-2014-12-10_17:25:02-upgrade:dumpling-firefly-x:stress-split-next-distro-basic-vps_head
teuthology-2014-12-10_18:10:03-upgrade:dumpling-firefly-x:parallel-giant-distro-basic-multi_head
teuthology-2014-12-10_18:13:01-upgrade:firefly-x-giant-distro-basic-multi_head
teuthology-2014-12-10_18:15:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi_head
teuthology-2014-12-10_19:13:03-upgrade:dumpling-x-firefly-distro-basic-vps_head
teuthology-2014-12-10_23:00:03-rbd-master-testing-basic-multi_head
teuthology-2014-12-10_23:02:01-rgw-master-testing-basic-multi_head
teuthology-2014-12-10_23:04:05-fs-master-testing-basic-multi_head
teuthology-2014-12-10_23:06:01-krbd-master-testing-basic-multi_head
teuthology-2014-12-10_23:08:01-kcephfs-master-testing-basic-multi_head
teuthology-2014-12-10_23:10:02-knfs-master-testing-basic-multi_head
teuthology-2014-12-10_23:12:01-hadoop-master-testing-basic-multi_head
teuthology-2014-12-10_23:14:01-samba-master-testing-basic-multi_head
teuthology-2014-12-10_23:16:01-rest-master-testing-basic-multi_head
teuthology-2014-12-10_23:18:01-multimds-master-testing-basic-multi_head
teuthology-2014-12-10_23:20:02-multi-version-giant-distro-basic-multi_head
teuthology-2014-12-10_23:20:02-multi-version-master-distro-basic-multi_head
teuthology-2014-12-11_01:10:03-ceph-deploy-firefly-distro-basic-multi_head
teuthology-2014-12-11_02:35:03-smoke-master-distro-basic-multi_head

Actions #36

Updated by Samuel Just about 7 years ago

wip-18533 is now cleaned up and has two specific unit tests and a fuzzer which reproduce invalid iterator results.

Actions #37

Updated by David Zafman about 7 years ago

  • Status changed from New to 17
Actions #38

Updated by David Zafman about 7 years ago

Now simplified rm_keys() by copying to clone() and no longer using complete mapping:

Master pull request

https://github.com/ceph/ceph/pull/13423

Kraken pull request being used for testing on Large Rados Cluster

https://github.com/ceph/ceph/pull/14024

Actions #39

Updated by David Zafman about 7 years ago

  • Status changed from 17 to Pending Backport
  • Backport set to kraken
Actions #40

Updated by David Zafman about 7 years ago

  • Copied to Backport #19391: kraken: two instances of omap_digest mismatch added
Actions #41

Updated by David Zafman about 7 years ago

  • Backport changed from kraken to jewel, kraken
Actions #42

Updated by David Zafman about 7 years ago

  • Copied to Backport #19404: jewel: core: two instances of omap_digest mismatch added
Actions #43

Updated by Nathan Cutler almost 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF