Project

General

Profile

Actions

Bug #45032

closed

cephadm: Not recovering from `OSError: cannot send (already closed?)`

Added by Sebastian Wagner about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Matthew Oliver
Category:
cephadm
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Workaround for this was:

ceph mgr fail ; sleep 20 ; ceph orch host add mon001
Apr 09 17:27:31 mon000 bash[39359]: Warning: Permanently added 'mon001,10.201.32.111' (ECDSA) to the list of known hosts.
Apr 09 17:27:32 mon000 bash[39359]: bash: python3: command not found
Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.096+0000 7f652806d700  0 [cephadm ERROR root@mon001] Can't communicate with remote host, possibly be
Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.097+0000 7f652806d700 -1 log_channel(cephadm) log [ERR] : Can't communicate with remote host, possib
Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.099+0000 7f652806d700  0 [cephadm ERROR root] cannot send (already closed?)
Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last):
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send
Apr 09 17:27:32 mon000 bash[39359]:     message.to_io(self._io)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io
Apr 09 17:27:32 mon000 bash[39359]:     io.write(header + self.data)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 410, in write
Apr 09 17:27:32 mon000 bash[39359]:     self.outfile.flush()
Apr 09 17:27:32 mon000 bash[39359]: BrokenPipeError: [Errno 32] Broken pipe
Apr 09 17:27:32 mon000 bash[39359]: During handling of the above exception, another exception occurred:
Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last):
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1544, in _run_cephadm
Apr 09 17:27:32 mon000 bash[39359]:     conn, connr = self._get_connection(addr)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1507, in _get_connection
Apr 09 17:27:32 mon000 bash[39359]:     ssh_options=self._ssh_options)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__
Apr 09 17:27:32 mon000 bash[39359]:     self.gateway = self._make_gateway(hostname)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway
Apr 09 17:27:32 mon000 bash[39359]:     gateway.reconfigure(py2str_as_py3str=False, py3str_as_py2str=False)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway.py", line 72, in reconfigure
Apr 09 17:27:32 mon000 bash[39359]:     self._send(Message.RECONFIGURE, data=data)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send
Apr 09 17:27:32 mon000 bash[39359]:     raise IOError("cannot send (already closed?)")
Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?)
Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.099+0000 7f652806d700 -1 log_channel(cephadm) log [ERR] : cannot send (already closed?)
Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last):
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send
Apr 09 17:27:32 mon000 bash[39359]:     message.to_io(self._io)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io
Apr 09 17:27:32 mon000 bash[39359]:     io.write(header + self.data)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 410, in write
Apr 09 17:27:32 mon000 bash[39359]:     self.outfile.flush()
Apr 09 17:27:32 mon000 bash[39359]: BrokenPipeError: [Errno 32] Broken pipe
Apr 09 17:27:32 mon000 bash[39359]: During handling of the above exception, another exception occurred:
Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last):
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1544, in _run_cephadm
Apr 09 17:27:32 mon000 bash[39359]:     conn, connr = self._get_connection(addr)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1507, in _get_connection
Apr 09 17:27:32 mon000 bash[39359]:     ssh_options=self._ssh_options)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__
Apr 09 17:27:32 mon000 bash[39359]:     self.gateway = self._make_gateway(hostname)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway
Apr 09 17:27:32 mon000 bash[39359]:     gateway.reconfigure(py2str_as_py3str=False, py3str_as_py2str=False)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway.py", line 72, in reconfigure
Apr 09 17:27:32 mon000 bash[39359]:     self._send(Message.RECONFIGURE, data=data)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send
Apr 09 17:27:32 mon000 bash[39359]:     raise IOError("cannot send (already closed?)")
Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?)
Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.100+0000 7f652806d700  0 [cephadm ERROR orchestrator._interface] _Promise failed
Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last):
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send
Apr 09 17:27:32 mon000 bash[39359]:     message.to_io(self._io)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io
Apr 09 17:27:32 mon000 bash[39359]:     io.write(header + self.data)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 410, in write
Apr 09 17:27:32 mon000 bash[39359]:     self.outfile.flush()
Apr 09 17:27:32 mon000 bash[39359]: BrokenPipeError: [Errno 32] Broken pipe
Apr 09 17:27:32 mon000 bash[39359]: During handling of the above exception, another exception occurred:
Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last):
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 444, in do_work
Apr 09 17:27:32 mon000 bash[39359]:     res = self._on_complete_(*args, **kwargs)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 512, in <lambda>
Apr 09 17:27:32 mon000 bash[39359]:     return cls(on_complete=lambda x: f(*x), value=args, name=name, **c_kwargs)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1645, in add_host
Apr 09 17:27:32 mon000 bash[39359]:     error_ok=True, no_fsid=True)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1544, in _run_cephadm
Apr 09 17:27:32 mon000 bash[39359]:     conn, connr = self._get_connection(addr)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1507, in _get_connection
Apr 09 17:27:32 mon000 bash[39359]:     ssh_options=self._ssh_options)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__
Apr 09 17:27:32 mon000 bash[39359]:     self.gateway = self._make_gateway(hostname)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway
Apr 09 17:27:32 mon000 bash[39359]:     gateway.reconfigure(py2str_as_py3str=False, py3str_as_py2str=False)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway.py", line 72, in reconfigure
Apr 09 17:27:32 mon000 bash[39359]:     self._send(Message.RECONFIGURE, data=data)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send
Apr 09 17:27:32 mon000 bash[39359]:     raise IOError("cannot send (already closed?)")
Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?)
Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.100+0000 7f652806d700 -1 log_channel(cephadm) log [ERR] : _Promise failed
Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last):
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send
Apr 09 17:27:32 mon000 bash[39359]:     message.to_io(self._io)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io
Apr 09 17:27:32 mon000 bash[39359]:     io.write(header + self.data)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 410, in write
Apr 09 17:27:32 mon000 bash[39359]:     self.outfile.flush()
Apr 09 17:27:32 mon000 bash[39359]: BrokenPipeError: [Errno 32] Broken pipe
Apr 09 17:27:32 mon000 bash[39359]: During handling of the above exception, another exception occurred:
Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last):
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 444, in do_work
Apr 09 17:27:32 mon000 bash[39359]:     res = self._on_complete_(*args, **kwargs)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 512, in <lambda>
Apr 09 17:27:32 mon000 bash[39359]:     return cls(on_complete=lambda x: f(*x), value=args, name=name, **c_kwargs)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1645, in add_host
Apr 09 17:27:32 mon000 bash[39359]:     error_ok=True, no_fsid=True)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1544, in _run_cephadm
Apr 09 17:27:32 mon000 bash[39359]:     conn, connr = self._get_connection(addr)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1507, in _get_connection
Apr 09 17:27:32 mon000 bash[39359]:     ssh_options=self._ssh_options)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__
Apr 09 17:27:32 mon000 bash[39359]:     self.gateway = self._make_gateway(hostname)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway
Apr 09 17:27:32 mon000 bash[39359]:     gateway.reconfigure(py2str_as_py3str=False, py3str_as_py2str=False)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway.py", line 72, in reconfigure
Apr 09 17:27:32 mon000 bash[39359]:     self._send(Message.RECONFIGURE, data=data)
Apr 09 17:27:32 mon000 bash[39359]:   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send
Apr 09 17:27:32 mon000 bash[39359]:     raise IOError("cannot send (already closed?)")
Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?)
Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.628+0000 7f653890e700 -1 mgr handle_command module 'orchestrator' command handler threw exception: c
Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.628+0000 7f653890e700 -1 mgr.server reply reply (22) Invalid argument Traceback (most recent call la
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/mgr_module.py", line 1153, in _handle_command
Apr 09 17:27:32 mon000 bash[39359]:     return self.handle_command(inbuf, cmd)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 110, in handle_command
Apr 09 17:27:32 mon000 bash[39359]:     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/mgr_module.py", line 308, in call
Apr 09 17:27:32 mon000 bash[39359]:     return self.func(mgr, **kwargs)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 72, in <lambda>
Apr 09 17:27:32 mon000 bash[39359]:     wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 63, in wrapper
Apr 09 17:27:32 mon000 bash[39359]:     return func(*args, **kwargs)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/orchestrator/module.py", line 179, in _add_host
Apr 09 17:27:32 mon000 bash[39359]:     raise_if_exception(completion)
Apr 09 17:27:32 mon000 bash[39359]:   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 628, in raise_if_exception
Apr 09 17:27:32 mon000 bash[39359]:     raise e
Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?)

Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #45627: cephadm: frequently getting `1 hosts fail cephadm check`ResolvedMatthew Oliver

Actions
Actions

Also available in: Atom PDF