Project

General

Profile

Actions

Bug #45621

closed

check-host returns terrible unhelpful error message

Added by João Soares almost 4 years ago. Updated almost 4 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After having some CEPHADM_HOST_CHECK_FAILED and CEPHADM_REFRESH_FAILED warnings after rebooting some hosts, I get the following error message while trying to run it manually:

~$ sudo cephadm shell -- ceph cephadm check-host CEPH-OSD-01
INFO:cephadm:Inferring fsid e3277c06-bc9d-41a6-b597-bec1605dd741
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
Error EINVAL: Traceback (most recent call last):
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send
    message.to_io(self._io)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io
    io.write(header + self.data)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 409, in write
    self._write(data)
ValueError: write to closed file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1153, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 110, in handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 308, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 72, in <lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 63, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1485, in check_host
    error_ok=True, no_fsid=True)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1601, in _run_cephadm
    python = connr.choose_python()
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 158, in wrapper
    self.channel.send("%s(%s)" % (name, arguments))
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 729, in send
    self.gateway._send(Message.CHANNEL_DATA, self.id, dumps_internal(item))
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send
    raise IOError("cannot send (already closed?)")
OSError: cannot send (already closed?)

Now, for starters I have no idea where to start looking, I don't even get any logs from sshd on the target machine. As of this error message, it won't help me debug my problem unless (I believe) I go and read the mgr_module.py code.


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #45627: cephadm: frequently getting `1 hosts fail cephadm check`ResolvedMatthew Oliver

Actions
Actions #1

Updated by João Soares almost 4 years ago

I find that doing ceph mgr fail fixes the problem, but could never guess from the message.

Actions #2

Updated by Sebastian Wagner almost 4 years ago

  • Related to Bug #45627: cephadm: frequently getting `1 hosts fail cephadm check` added
Actions #3

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF