Actions
Bug #48230
closednautilus: cluster [ERR] mgr modules have failed (MGR_MODULE_ERROR)
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2020-11-12T22:33:15.916 INFO:tasks.ceph.mon.a.smithi038.stderr:2020-11-12 22:33:15.917 7fc763b5c700 -1 log_channel(cluster) log [ERR] : Health check failed: 3 mgr modules have failed (MGR_MODULE_ERROR)
Looking at the mon log:
2020-11-12 22:33:15.917 7fc763b5c700 20 mon.a@0(leader).mgrstat health checks: { "MGR_MODULE_ERROR": { "severity": "HEALTH_ERR", "summary": { "message": "3 mgr modules have failed" }, "detail": [ { "message": "Module 'rbd_support' has failed: Not found or unloadable" }, { "message": "Module 'status' has failed: Not found or unloadable" }, { "message": "Module 'volumes' has failed: Not found or unloadable" } ] }, "PG_AVAILABILITY": { "severity": "HEALTH_WARN", "summary": { "message": "Reduced data availability: 6 pgs peering" }, "detail": [ { "message": "pg 1.0 is stuck peering for 123.626751, current state peering, last acting [1,0]" }, { "message": "pg 1.2 is stuck peering for 123.624747, current state peering, last acting [0]" }, { "message": "pg 1.3 is stuck peering for 123.626252, current state peering, last acting [1]" }, { "message": "pg 1.4 is stuck peering for 123.627984, current state peering, last acting [1,0]" }, { "message": "pg 1.6 is stuck peering for 123.627208, current state peering, last acting [1,0]" }, { "message": "pg 1.7 is stuck peering for 123.625035, current state peering, last acting [1]" } ] } } . . . 2020-11-12 22:33:24.591 7fc766361700 0 log_channel(cluster) log [INF] : Health check cleared: MGR_MODULE_ERROR (was: 3 mgr modules have failed)
I think this can be ignored.
/a/yuriw-2020-11-12_20:34:09-rados-nautilus-distro-basic-smithi/5617251
Updated by Dan Mick over 3 years ago
It's odd, because the mgr log for the job cited above shows a lot of what look like normal status messages from rbd_support (at least)
Updated by Neha Ojha over 3 years ago
This seems to be due to those 3 modules not being present in "modules" when get_health_checks() is called.
2020-11-12 22:33:22.397 7f79c57f3700 15 mgr get_health_checks getting health checks forbalancer 2020-11-12 22:33:22.397 7f79c57f3700 15 mgr get_health_checks getting health checks forcrash 2020-11-12 22:33:22.397 7f79c57f3700 15 mgr get_health_checks getting health checks fordevicehealth 2020-11-12 22:33:22.397 7f79c57f3700 15 mgr get_health_checks getting health checks foriostat 2020-11-12 22:33:22.397 7f79c57f3700 15 mgr get_health_checks getting health checks fororchestrator_cli 2020-11-12 22:33:22.397 7f79c57f3700 15 mgr get_health_checks getting health checks forprogress 2020-11-12 22:33:22.397 7f79c57f3700 10 mgr update_delta_stats v15 2020-11-12 22:33:22.397 7f79c57f3700 10 mgr.server operator() 8 pgs: 6 active+clean, 2 peering; 0 B data, 548 KiB used, 267 GiB / 270 GiB avail 2020-11-12 22:33:22.397 7f79c57f3700 10 mgr.server operator() 2 health checks 2020-11-12 22:33:22.397 7f79c57f3700 20 mgr.server operator() health checks: { "MGR_MODULE_ERROR": { "severity": "HEALTH_ERR", "summary": { "message": "3 mgr modules have failed" }, "detail": [ { "message": "Module 'rbd_support' has failed: Not found or unloadable" }, { "message": "Module 'status' has failed: Not found or unloadable" }, { "message": "Module 'volumes' has failed: Not found or unloadable" } ] },
later they are
2020-11-12 22:33:24.397 7f79c57f3700 10 mgr.server tick 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks forbalancer 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks forcrash 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks fordevicehealth 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks foriostat 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks fororchestrator_cli 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks forprogress 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks forrbd_support 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks forrestful 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks forstatus 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks forvolumes 2020-11-12 22:33:24.397 7f79c57f3700 10 mgr update_delta_stats v17 2020-11-12 22:33:24.397 7f79c57f3700 10 mgr.server operator() 24 pgs: 16 unknown, 6 active+clean, 2 peering; 0 B data, 548 KiB used, 267 GiB / 270 GiB avail 2020-11-12 22:33:24.397 7f79c57f3700 10 mgr.server operator() 1 health checks 2020-11-12 22:33:24.397 7f79c57f3700 20 mgr.server operator() health checks: { "PG_AVAILABILITY": { "severity": "HEALTH_WARN", "summary": { "message": "Reduced data availability: 2 pgs peering" },
just looking at rbd_support
2020-11-12 22:33:04.902 7f5f738eb700 15 mgr get_health_checks getting health checks forrbd_support 2020-11-12 22:33:05.898 7f5f6f0a2700 20 mgr[rbd_support] TaskHandler: tick 2020-11-12 22:33:05.906 7f5f6f8a3700 20 mgr[rbd_support] PerfHandler: tick 2020-11-12 22:33:06.134 7f5f8d6f4700 15 mgr notify_all queuing notify to rbd_support 2020-11-12 22:33:07.455 7f79f360ae40 1 mgr[py] Loading python module 'rbd_support' 2020-11-12 22:33:07.489 7f79f360ae40 4 mgr[py] load_subclass_of: found class: 'rbd_support.Module' 2020-11-12 22:33:07.489 7f79f360ae40 4 mgr[py] Standby mode not provided by module 'rbd_support' 2020-11-12 22:33:08.405 7f79c6ff6700 4 mgr[py] Starting rbd_support 2020-11-12 22:33:08.414 7f79c17ab700 4 mgr[rbd_support] PerfHandler: starting 2020-11-12 22:33:10.917 7f79c6ff6700 4 mgr[rbd_support] load_task_task: rbd, start_after= 2020-11-12 22:33:13.414 7f79c17ab700 20 mgr[rbd_support] PerfHandler: tick "message": "Module 'rbd_support' has failed: Not found or unloadable" "message": "Module 'rbd_support' has failed: Not found or unloadable" "message": "Module 'rbd_support' has failed: Not found or unloadable" 2020-11-12 22:33:18.415 7f79c17ab700 20 mgr[rbd_support] PerfHandler: tick "message": "Module 'rbd_support' has failed: Not found or unloadable" "message": "Module 'rbd_support' has failed: Not found or unloadable" 2020-11-12 22:33:23.415 7f79c17ab700 20 mgr[rbd_support] PerfHandler: tick 2020-11-12 22:33:23.429 7f79c6ff6700 20 mgr[rbd_support] sequence=0, tasks_by_sequence={}, tasks_by_id={} 2020-11-12 22:33:23.430 7f79bcfa2700 4 mgr[rbd_support] TaskHandler: starting 2020-11-12 22:33:23.430 7f79c6ff6700 1 mgr load Constructed class from module: rbd_support 2020-11-12 22:33:23.430 7f79c6ff6700 4 mgr operator() Starting thread for rbd_support 2020-11-12 22:33:23.430 7f79bc7a1700 4 mgr entry Entering thread for rbd_support 2020-11-12 22:33:23.575 7f79df5fc700 15 mgr notify_all queuing notify to rbd_support 2020-11-12 22:33:23.591 7f79df5fc700 15 mgr notify_all queuing notify (clog) to rbd_support 2020-11-12 22:33:23.591 7f79df5fc700 15 mgr notify_all queuing notify (clog) to rbd_support 2020-11-12 22:33:24.397 7f79c57f3700 15 mgr get_health_checks getting health checks forrbd_support
Updated by Neha Ojha over 3 years ago
Updated by Neha Ojha over 3 years ago
- Related to Bug #46224: Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR) added
Updated by Neha Ojha over 3 years ago
- Status changed from New to Resolved
- Pull request ID set to 38069
Actions