Actions
Bug #45032
closedcephadm: Not recovering from `OSError: cannot send (already closed?)`
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
Workaround for this was:
ceph mgr fail ; sleep 20 ; ceph orch host add mon001
Apr 09 17:27:31 mon000 bash[39359]: Warning: Permanently added 'mon001,10.201.32.111' (ECDSA) to the list of known hosts. Apr 09 17:27:32 mon000 bash[39359]: bash: python3: command not found Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.096+0000 7f652806d700 0 [cephadm ERROR root@mon001] Can't communicate with remote host, possibly be Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.097+0000 7f652806d700 -1 log_channel(cephadm) log [ERR] : Can't communicate with remote host, possib Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.099+0000 7f652806d700 0 [cephadm ERROR root] cannot send (already closed?) Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last): Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send Apr 09 17:27:32 mon000 bash[39359]: message.to_io(self._io) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io Apr 09 17:27:32 mon000 bash[39359]: io.write(header + self.data) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 410, in write Apr 09 17:27:32 mon000 bash[39359]: self.outfile.flush() Apr 09 17:27:32 mon000 bash[39359]: BrokenPipeError: [Errno 32] Broken pipe Apr 09 17:27:32 mon000 bash[39359]: During handling of the above exception, another exception occurred: Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last): Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1544, in _run_cephadm Apr 09 17:27:32 mon000 bash[39359]: conn, connr = self._get_connection(addr) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1507, in _get_connection Apr 09 17:27:32 mon000 bash[39359]: ssh_options=self._ssh_options) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__ Apr 09 17:27:32 mon000 bash[39359]: self.gateway = self._make_gateway(hostname) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway Apr 09 17:27:32 mon000 bash[39359]: gateway.reconfigure(py2str_as_py3str=False, py3str_as_py2str=False) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway.py", line 72, in reconfigure Apr 09 17:27:32 mon000 bash[39359]: self._send(Message.RECONFIGURE, data=data) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send Apr 09 17:27:32 mon000 bash[39359]: raise IOError("cannot send (already closed?)") Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?) Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.099+0000 7f652806d700 -1 log_channel(cephadm) log [ERR] : cannot send (already closed?) Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last): Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send Apr 09 17:27:32 mon000 bash[39359]: message.to_io(self._io) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io Apr 09 17:27:32 mon000 bash[39359]: io.write(header + self.data) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 410, in write Apr 09 17:27:32 mon000 bash[39359]: self.outfile.flush() Apr 09 17:27:32 mon000 bash[39359]: BrokenPipeError: [Errno 32] Broken pipe Apr 09 17:27:32 mon000 bash[39359]: During handling of the above exception, another exception occurred: Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last): Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1544, in _run_cephadm Apr 09 17:27:32 mon000 bash[39359]: conn, connr = self._get_connection(addr) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1507, in _get_connection Apr 09 17:27:32 mon000 bash[39359]: ssh_options=self._ssh_options) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__ Apr 09 17:27:32 mon000 bash[39359]: self.gateway = self._make_gateway(hostname) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway Apr 09 17:27:32 mon000 bash[39359]: gateway.reconfigure(py2str_as_py3str=False, py3str_as_py2str=False) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway.py", line 72, in reconfigure Apr 09 17:27:32 mon000 bash[39359]: self._send(Message.RECONFIGURE, data=data) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send Apr 09 17:27:32 mon000 bash[39359]: raise IOError("cannot send (already closed?)") Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?) Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.100+0000 7f652806d700 0 [cephadm ERROR orchestrator._interface] _Promise failed Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last): Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send Apr 09 17:27:32 mon000 bash[39359]: message.to_io(self._io) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io Apr 09 17:27:32 mon000 bash[39359]: io.write(header + self.data) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 410, in write Apr 09 17:27:32 mon000 bash[39359]: self.outfile.flush() Apr 09 17:27:32 mon000 bash[39359]: BrokenPipeError: [Errno 32] Broken pipe Apr 09 17:27:32 mon000 bash[39359]: During handling of the above exception, another exception occurred: Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last): Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 444, in do_work Apr 09 17:27:32 mon000 bash[39359]: res = self._on_complete_(*args, **kwargs) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 512, in <lambda> Apr 09 17:27:32 mon000 bash[39359]: return cls(on_complete=lambda x: f(*x), value=args, name=name, **c_kwargs) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1645, in add_host Apr 09 17:27:32 mon000 bash[39359]: error_ok=True, no_fsid=True) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1544, in _run_cephadm Apr 09 17:27:32 mon000 bash[39359]: conn, connr = self._get_connection(addr) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1507, in _get_connection Apr 09 17:27:32 mon000 bash[39359]: ssh_options=self._ssh_options) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__ Apr 09 17:27:32 mon000 bash[39359]: self.gateway = self._make_gateway(hostname) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway Apr 09 17:27:32 mon000 bash[39359]: gateway.reconfigure(py2str_as_py3str=False, py3str_as_py2str=False) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway.py", line 72, in reconfigure Apr 09 17:27:32 mon000 bash[39359]: self._send(Message.RECONFIGURE, data=data) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send Apr 09 17:27:32 mon000 bash[39359]: raise IOError("cannot send (already closed?)") Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?) Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.100+0000 7f652806d700 -1 log_channel(cephadm) log [ERR] : _Promise failed Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last): Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 997, in _send Apr 09 17:27:32 mon000 bash[39359]: message.to_io(self._io) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 443, in to_io Apr 09 17:27:32 mon000 bash[39359]: io.write(header + self.data) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 410, in write Apr 09 17:27:32 mon000 bash[39359]: self.outfile.flush() Apr 09 17:27:32 mon000 bash[39359]: BrokenPipeError: [Errno 32] Broken pipe Apr 09 17:27:32 mon000 bash[39359]: During handling of the above exception, another exception occurred: Apr 09 17:27:32 mon000 bash[39359]: Traceback (most recent call last): Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 444, in do_work Apr 09 17:27:32 mon000 bash[39359]: res = self._on_complete_(*args, **kwargs) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 512, in <lambda> Apr 09 17:27:32 mon000 bash[39359]: return cls(on_complete=lambda x: f(*x), value=args, name=name, **c_kwargs) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1645, in add_host Apr 09 17:27:32 mon000 bash[39359]: error_ok=True, no_fsid=True) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1544, in _run_cephadm Apr 09 17:27:32 mon000 bash[39359]: conn, connr = self._get_connection(addr) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1507, in _get_connection Apr 09 17:27:32 mon000 bash[39359]: ssh_options=self._ssh_options) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__ Apr 09 17:27:32 mon000 bash[39359]: self.gateway = self._make_gateway(hostname) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway Apr 09 17:27:32 mon000 bash[39359]: gateway.reconfigure(py2str_as_py3str=False, py3str_as_py2str=False) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway.py", line 72, in reconfigure Apr 09 17:27:32 mon000 bash[39359]: self._send(Message.RECONFIGURE, data=data) Apr 09 17:27:32 mon000 bash[39359]: File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 1003, in _send Apr 09 17:27:32 mon000 bash[39359]: raise IOError("cannot send (already closed?)") Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?) Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.628+0000 7f653890e700 -1 mgr handle_command module 'orchestrator' command handler threw exception: c Apr 09 17:27:32 mon000 bash[39359]: debug 2020-04-09T22:27:32.628+0000 7f653890e700 -1 mgr.server reply reply (22) Invalid argument Traceback (most recent call la Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/mgr_module.py", line 1153, in _handle_command Apr 09 17:27:32 mon000 bash[39359]: return self.handle_command(inbuf, cmd) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 110, in handle_command Apr 09 17:27:32 mon000 bash[39359]: return dispatch[cmd['prefix']].call(self, cmd, inbuf) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/mgr_module.py", line 308, in call Apr 09 17:27:32 mon000 bash[39359]: return self.func(mgr, **kwargs) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 72, in <lambda> Apr 09 17:27:32 mon000 bash[39359]: wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 63, in wrapper Apr 09 17:27:32 mon000 bash[39359]: return func(*args, **kwargs) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/orchestrator/module.py", line 179, in _add_host Apr 09 17:27:32 mon000 bash[39359]: raise_if_exception(completion) Apr 09 17:27:32 mon000 bash[39359]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 628, in raise_if_exception Apr 09 17:27:32 mon000 bash[39359]: raise e Apr 09 17:27:32 mon000 bash[39359]: OSError: cannot send (already closed?)
Actions