Project

General

Profile

Actions

Bug #61147

open

Ceph version 15.2.17 (octopus stable) - HEALTH_ERR 4 mgr modules have failed

Added by Duy Nguyen Hong 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitoring/Alerting
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi Team,

We have 1 problem with Ceph health check error.

HEALTH_ERR 4 mgr modules have failed
[ERR] MGR_MODULE_ERROR: 4 mgr modules have failed
Module 'devicehealth' has failed: Not found or unloadable
Module 'pg_autoscaler' has failed: Not found or unloadable
Module 'telemetry' has failed: 'NoneType' object has no attribute 'items'
Module 'volumes' has failed: Not found or unloadable

We have try disable and enable module diskprediction_local. After the alert appeared on ceph -s.
How to turn off that alert ?

My kernel = 5.4.0-124-generic

My Cluster env prod :
cluster:
id: cf0e8a4a-9c0a-11eb-966b-6fb1f36da8cd
health: HEALTH_ERR
4 mgr modules have failed

services:
mon: 5 daemons, quorum cephnode-120,cephnode-121,cephnode-124,cephnode-123,cephnode-122 (age 7M)
mgr: cephnode-121.jsgurc(active, since 10h), standbys: cephnode-120.kxyhfa
mds: fplay:2 {0=fplay.cephnode-127.baegiz=up:active,1=fplay.cephnode-128.qfebeb=up:active} 2 up:standby
osd: 342 osds: 342 up (since 29h), 342 in (since 5w); 3 remapped pgs
rgw: 15 daemons active (btsx.hcm.cephnode-120.pwxelj, btsx.hcm.cephnode-121.pvfttc, btsx.hcm.cephnode-122.grlyac, btsx.hcm.cephnode-123.aqfbgm, btsx.hcm.cephnode-124.ajeqqn, btsx.hcm.cephnode-125.osyamm, btsx.hcm.cephnode-126.wjlrlu, btsx.hcm.cephnode-127.qjsnme, btsx.hcm.cephnode-128.ndxmbx, btsx.hcm.cephnode-129.xvuhjm, btsx.hcm.cephnode-131.rsegwd, btsx.hcm.cephnode-132.fdeygv, btsx.hcm.cephnode-133.hhxvoi, btsx.hcm.cephnode-134.rrnbwv, btsx.hcm.cephnode-135.kobjbm)
task status:
data:
pools: 11 pools, 1321 pgs
objects: 404.38M objects, 1.8 PiB
usage: 2.5 PiB used, 1.5 PiB / 4.0 PiB avail
pgs: 813120/6468902726 objects misplaced (0.013%)
1316 active+clean
3 active+remapped+backfilling
2 active+clean+scrubbing+deep
io:
client: 318 MiB/s rd, 16 KiB/s wr, 115 op/s rd, 16 op/s wr
recovery: 158 MiB/s, 33 objects/s

No data to display

Actions

Also available in: Atom PDF