Project

General

Profile

Actions

Bug #44286

open

Cache tiering shows unfound objects after OSD reboots

Added by Paul Emmerich about 4 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We've got a cluster with a 3/2 size/min_size replicated cache pool in front of an erasure coded pool used for RBD.

Restarting OSDs sometimes results in unfound objects, example:

2/543658058 objects unfound (0.000%)
pg 19.12 has 1 unfound objects
pg 19.2d has 1 unfound objects

Possible data damage: 2 pgs recovery_unfound
pg 19.12 is active+recovery_unfound+undersized+degraded+remapped, acting [299,310], 1 unfound
pg 19.2d is active+recovery_unfound+undersized+degraded+remapped, acting [290,309], 1 unfound

# ceph pg 19.12 list_unfound
{
    "num_missing": 1,
    "num_unfound": 1,
    "objects": [
        {
            "oid": {
                "oid": "hit_set_19.12_archive_2020-02-25 13:43:50.256316Z_2020-02-25 13:43:50.325825Z",
                "key": "",
                "snapid": -2,
                "hash": 18,
                "max": 0,
                "pool": 19,
                "namespace": ".ceph-internal" 
            },
            "need": "3312398'55868341",
            "have": "0'0",
            "flags": "none",
            "locations": []
        }
    ],
    "more": false
}

Both PGs affected here share an OSD (the one that's offline).
The cache tiering agent is busy flushing with around 300-500 MB/s while this happens.

The unfound objects stay unfound even after all OSDs are back online. The affected PG never goes below 2 online OSDs.
Restarting the OSDs does not change the state, so it's not an instance of https://tracker.ceph.com/issues/37439

Ceph version 14.2.6 (restarting to upgrade to 14.2.7). Also seen on 14.2.4 a few months ago.

Attached is a pg query on a PG in that state (from an earlier instance of this issue, also 14.2.6)


Files

pg-query.json (20.1 KB) pg-query.json Paul Emmerich, 02/25/2020 02:22 PM
Actions

Also available in: Atom PDF