Actions
Bug #62578
openmon: osd pg-upmap-items command causes PG_DEGRADED warnings
Status:
New
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:
0%
Source:
Q/A
Tags:
Backport:
reef,quincy,pacific
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2023-08-22T15:33:52.003+0000 7ffbe2517700 7 mon.a@0(leader).osd e214 prepare_update mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": "31.23", "id": [0, 7]} v 0) v1 from mgr.4100 172.21.15.70:0/33077 ... 2023-08-22T15:33:53.187+0000 7ffbe2517700 20 mon.a@0(leader).mgrstat health checks: { "PG_DEGRADED": { "severity": "HEALTH_WARN", "summary": { "message": "Degraded data redundancy: 1 pg degraded", "count": 1 }, "detail": [ { "message": "pg 31.23 is active+recovering+degraded, acting [7,3]" } ] } }
From: /teuthology/yuriw-2023-08-22_14:48:56-fs-pacific-release-distro-default-smithi/7376330/remote/smithi070/log/ceph-mon.a.log.gz
I am loath to silence PG_DEGRADED across the board. Is there a way to avoid this warning otherwise?
Updated by Radoslaw Zarzynski 8 months ago
Form the the bug scrub: the message
mentions acting [7,3]
but the osd pg-upmap-items
requested [0, 7]
.
Isn't this looking like a race condition in e.g. the balancer?
Do we have a compare-and-swap based on e.g. epoch used for the calculation of [0, 7]
?
There is a time window between calculating the requested set and processing the command in which a crash of an OSD can happen. This would explain the recovery
state.
Actions