Bug #61180
Ceph version 15.2.17 (octopus stable) - HEALTH_ERR 4 mgr modules have failed
Status:
New
Priority:
Normal
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi Team,
We have 1 problem with Ceph health check error.
HEALTH_ERR 4 mgr modules have failed
[ERR] MGR_MODULE_ERROR: 4 mgr modules have failed
Module 'devicehealth' has failed: Not found or unloadable
Module 'pg_autoscaler' has failed: Not found or unloadable
Module 'telemetry' has failed: 'NoneType' object has no attribute 'items'
Module 'volumes' has failed: Not found or unloadable
We have try disable and enable module diskprediction_local. After the alert appeared on ceph -s.
How to turn off that alert ?
My kernel = 5.4.0-124-generic
My Cluster env prod :
cluster:
id: cf0e8a4a-9c0a-11eb-966b-6fb1f36da8cd
health: HEALTH_ERR
4 mgr modules have failed
cluster:
id: cf0e8a4a-9c0a-11eb-966b-6fb1f36da8cd
health: HEALTH_ERR
4 mgr modules have failed
services:
mon: 5 daemons, quorum cephnode-120,cephnode-121,cephnode-124,cephnode-123,cephnode-122 (age 7M)
mgr: cephnode-121.jsgurc(active, since 22h), standbys: cephnode-120.kxyhfa
mds: fplay:2 {0=fplay.cephnode-127.baegiz=up:active,1=fplay.cephnode-128.qfebeb=up:active} 2 up:standby
osd: 342 osds: 342 up (since 2h), 342 in (since 5w); 17 remapped pgs
rgw: 15 daemons active (btsx.hcm.cephnode-120.pwxelj, btsx.hcm.cephnode-121.pvfttc, btsx.hcm.cephnode-122.grlyac, btsx.hcm.cephnode-123.aqfbgm, btsx.hcm.cephnode-124.ajeqqn, btsx.hcm.cephnode-125.osyamm, btsx.hcm.cephnode-126.wjlrlu, btsx.hcm.cephnode-127.qjsnme, btsx.hcm.cephnode-128.ndxmbx, btsx.hcm.cephnode-129.xvuhjm, btsx.hcm.cephnode-131.rsegwd, btsx.hcm.cephnode-132.fdeygv, btsx.hcm.cephnode-133.hhxvoi, btsx.hcm.cephnode-134.rrnbwv, btsx.hcm.cephnode-135.kobjbm)
task status:
data:
pools: 11 pools, 1321 pgs
objects: 406.11M objects, 1.8 PiB
usage: 2.5 PiB used, 1.5 PiB / 4.0 PiB avail
pgs: 2745648/6496535596 objects misplaced (0.042%)
1302 active+clean
15 active+remapped+backfilling
2 active+clean+scrubbing+deep
2 active+remapped+backfill_wait
io:
client: 44 MiB/s rd, 69 MiB/s wr, 56 op/s rd, 33 op/s wr
recovery: 1.1 GiB/s, 230 objects/s