Project

General

Profile

Actions

Bug #46644

open

"[ERR] Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)" in upgrade:nautilus-x-octopus

Added by Yuri Weinstein almost 4 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/nautilus-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: https://pulpito.ceph.com/teuthology-2020-07-17_14:23:04-upgrade:nautilus-x-octopus-distro-basic-smithi/
Jobs: '5235278', '5235270', '5235274', '5235281'

I see this error (not sure if a root cause):

2020-07-17T16:14:10.564 INFO:tasks.qemu.client.0.smithi099.stdout:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCyNlXy+2GAyvs9foluvU2X0uYnhBY9aJxgGDM56YC4gJW758RjFk9cdkvM+/jpybUJGI0XRJkePqumW4ayb+wdGhP1X3xMcDRXE7A81IyGtVtlJYmik7yB1578EWN0dyWqBXmh+hfyf7CeRmwWA25lrWjixJSViLI5a4LtBKe9+7Kf0b/iZ0wIeIOVL61FQqaCacHhcMo+TY7HtPjmGiCD4/Jevhq1ZevobLp9wN9by+UdBwe1xGnFLKBDs/d+/0X/fuzdjlkR3jwUWNnGgCVjjZygDtO89W/CF1MVYkYUpwNHg1TzLLyoHnMfAGecpjT0w9s+d/C+ubHCsy8LwL/l root@test
2020-07-17T16:14:10.564 INFO:tasks.qemu.client.0.smithi099.stdout:-----END SSH HOST KEY KEYS-----
2020-07-17T16:14:10.564 INFO:tasks.qemu.client.0.smithi099.stdout:Traceback (most recent call last):
2020-07-17T16:14:10.564 INFO:tasks.qemu.client.0.smithi099.stdout:  File "/usr/lib/python2.7/logging/handlers.py", line 807, in emit
2020-07-17T16:14:10.564 INFO:tasks.qemu.client.0.smithi099.stdout:    self._connect_unixsocket(self.address)
2020-07-17T16:14:10.565 INFO:tasks.qemu.client.0.smithi099.stdout:  File "/usr/lib/python2.7/logging/handlers.py", line 745, in _connect_unixsocket
2020-07-17T16:14:10.565 INFO:tasks.qemu.client.0.smithi099.stdout:    self.socket.connect(address)
2020-07-17T16:14:10.565 INFO:tasks.qemu.client.0.smithi099.stdout:  File "/usr/lib/python2.7/socket.py", line 224, in meth
2020-07-17T16:14:10.565 INFO:tasks.qemu.client.0.smithi099.stdout:    return getattr(self._sock,name)(*args)
2020-07-17T16:14:10.565 INFO:tasks.qemu.client.0.smithi099.stdout:error: [Errno 111] Connection refused
2020-07-17T16:14:10.566 INFO:tasks.qemu.client.0.smithi099.stdout:Logged from file __init__.py, line 116
2020-07-17T16:14:10.566 INFO:tasks.qemu.client.0.smithi099.stdout:Traceback (most recent call last):
2020-07-17T16:14:10.566 INFO:tasks.qemu.client.0.smithi099.stdout:  File "/usr/lib/python2.7/logging/handlers.py", line 807, in emit
2020-07-17T16:14:10.566 INFO:tasks.qemu.client.0.smithi099.stdout:    self._connect_unixsocket(self.address)
2020-07-17T16:14:10.566 INFO:tasks.qemu.client.0.smithi099.stdout:  File "/usr/lib/python2.7/logging/handlers.py", line 745, in _connect_unixsocket
2020-07-17T16:14:10.567 INFO:tasks.qemu.client.0.smithi099.stdout:    self.socket.connect(address)
2020-07-17T16:14:10.567 INFO:tasks.qemu.client.0.smithi099.stdout:  File "/usr/lib/python2.7/socket.py", line 224, in meth
2020-07-17T16:14:10.567 INFO:tasks.qemu.client.0.smithi099.stdout:    return getattr(self._sock,name)(*args)
2020-07-17T16:14:10.567 INFO:tasks.qemu.client.0.smithi099.stdout:error: [Errno 111] Connection refused
2020-07-17T16:14:10.567 INFO:tasks.qemu.client.0.smithi099.stdout:Logged from file __init__.py, line 116

and final:

failure_reason: '"2020-07-17T15:09:25.361536+0000 mon.c (mon.0) 262 : cluster [ERR]
  Health check failed: 4 mgr modules have failed (MGR_MODULE_ERROR)" in cluster log'

Actions #1

Updated by Yuri Weinstein almost 4 years ago

  • ceph-qa-suite upgrade/nautilus-x added
Actions #5

Updated by Deepika Upadhyay over 3 years ago

seems not only limited to upgrade tests:
http://qa-proxy.ceph.com/teuthology/yuriw-2020-11-10_16:01:13-rados-wip-yuri7-testing-2020-11-09-0731-nautilus-distro-basic-smithi/5609404/teuthology.log

description: rados/singleton/{all/rebuild-mondb msgr-failures/few msgr/async objectstore/bluestore-hybrid
  rados supported-random-distro$/{ubuntu_16.04}}
duration: 697.6079661846161
failure_reason: '"2020-11-11 00:56:39.069210 mon.a (mon.0) 61 : cluster [ERR] Health
  check failed: 3 mgr modules have failed (MGR_MODULE_ERROR)" in cluster log'

Actions #6

Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to mgr
Actions

Also available in: Atom PDF