Project

General

Profile

Actions

Bug #51282

closed

pybind/mgr/mgr_util: .mgr pool may be created too early causing spurious PG_DEGRADED warnings

Added by Patrick Donnelly almost 3 years ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Category:
Testing
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-06-16T22:22:43.040+0000 7f6e8e779700 20 mon.a@0(leader).mgrstat health checks:
{   
    "PG_DEGRADED": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded",
            "count": 1
        },
        "detail": [
            {   
                "message": "pg 1.0 is active+undersized+degraded, acting [7]" 
            }
        ]
    }
}

From: /ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-20210616.191804-distro-basic-smithi/6175605/remote/smithi120/log/ceph-mon.a.log.gz

and a few other tests:

Failure: "2021-06-16T22:22:43.881363+0000 mon.a (mon.0) 143 : cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)" in cluster log
7 jobs: ['6175605', '6175619', '6175556', '6175591', '6175671', '6175600', '6175639']
suites intersection: ['conf/{client', 'mds', 'mon', 'osd}', 'overrides/{frag_enable', 'whitelist_health', 'whitelist_wrongly_marked_down}']
suites union: ['clusters/1a11s-mds-1c-client-3node', 'clusters/1a3s-mds-1c-client', 'conf/{client', 'distro/{centos_8}', 'distro/{rhel_8}', 'distro/{ubuntu_latest}', 'fs/snaps/{begin', 'fs/workload/{begin', 'k-testing}', 'mds', 'mon', 'mount/fuse', 'mount/kclient/{mount', 'ms-die-on-skipped}}', 'objectstore-ec/bluestore-comp', 'objectstore-ec/bluestore-comp-ec-root', 'omap_limit/10', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{distro/testing/{flavor/centos_latest', 'overrides/{distro/testing/{flavor/ubuntu_latest', 'overrides/{frag_enable', 'ranks/1', 'ranks/3', 'ranks/5', 'scrub/no', 'scrub/yes', 'session_timeout', 'standby-replay', 'tasks/workunit/snaps}', 'tasks/{0-check-counter', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'workunit/fs/misc}', 'workunit/fs/test_o_trunc}', 'workunit/suites/ffsb}', 'workunit/suites/fsstress}', 'workunit/suites/iogen}', 'workunit/suites/iozone}', 'wsync/{no}}', 'wsync/{yes}}']

I'm thinking this check is not quite right:https://github.com/ceph/ceph/blob/05d7f883a04d230cf17b40a1c7e8044d402c6a30/src/pybind/mgr/mgr_module.py#L972-L981

(I lifted that code from the devicehealth module.)


Related issues 2 (0 open2 closed)

Has duplicate rgw - Bug #51727: cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)" in cluster logClosed

Actions
Has duplicate CephFS - Bug #55825: cluster [WRN] Health check failed: Degraded data redundancy: 1 pg degraded (PG_DEGRADED)" in clusterDuplicate

Actions
Actions

Also available in: Atom PDF