Bug #62578: mon: osd pg-upmap-items command causes PG_DEGRADED warnings - RADOS - Ceph

Actions

Copy link

Bug #62578

open

mon: osd pg-upmap-items command causes PG_DEGRADED warnings

Added by Patrick Donnelly 8 months ago. Updated 8 months ago.

Status:

New

Priority:

Normal

Assignee:

Patrick Donnelly

Category:

Administration/Usability

Target version:

Ceph - v19.0.0

% Done:

Source:

Q/A

Tags:

Backport:

reef,quincy,pacific

Regression:

Severity:

4 - irritation

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Monitor

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2023-08-22T15:33:52.003+0000 7ffbe2517700  7 mon.a@0(leader).osd e214 prepare_update mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": "31.23", "id": [0, 7]} v 0) v1 from mgr.4100 172.21.15.70:0/33077
...
2023-08-22T15:33:53.187+0000 7ffbe2517700 20 mon.a@0(leader).mgrstat health checks:
{
    "PG_DEGRADED": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Degraded data redundancy: 1 pg degraded",
            "count": 1
        },
        "detail": [
            {
                "message": "pg 31.23 is active+recovering+degraded, acting [7,3]" 
            }
        ]
    }
}

From: /teuthology/yuriw-2023-08-22_14:48:56-fs-pacific-release-distro-default-smithi/7376330/remote/smithi070/log/ceph-mon.a.log.gz

I am loath to silence PG_DEGRADED across the board. Is there a way to avoid this warning otherwise?

Actions

Copy link

Updated by Radoslaw Zarzynski 8 months ago

Form the the bug scrub: the message mentions acting [7,3] but the osd pg-upmap-items requested [0, 7].
Isn't this looking like a race condition in e.g. the balancer?
Do we have a compare-and-swap based on e.g. epoch used for the calculation of [0, 7]?

There is a time window between calculating the requested set and processing the command in which a crash of an OSD can happen. This would explain the recovery state.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #62578

mon: osd pg-upmap-items command causes PG_DEGRADED warnings

Updated by Radoslaw Zarzynski 8 months ago