Bug #46990
closedexecnet: EOFError: couldnt load message header, expected 9 bytes, got 0
0%
Description
[ERR] MGR_MODULE_ERROR: Module 'cephadm' has failed: Failed to execute command: /usr/bin/python3 -u Module 'cephadm' has failed: Failed to execute command: /usr/bin/python3 -u
Aug 17 13:42:20 master bash[19272]: Traceback (most recent call last): Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 432, in from_io Aug 17 13:42:20 master bash[19272]: header = io.read(9) # type 1, channel 4, payload 4 Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read Aug 17 13:42:20 master bash[19272]: raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf))) Aug 17 13:42:20 master bash[19272]: EOFError: expected 9 bytes, got 0 Aug 17 13:42:20 master bash[19272]: During handling of the above exception, another exception occurred: Aug 17 13:42:20 master bash[19272]: Traceback (most recent call last): Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/remoto/process.py", line 188, in check Aug 17 13:42:20 master bash[19272]: response = result.receive(timeout) Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive Aug 17 13:42:20 master bash[19272]: raise self._getremoteerror() or EOFError() Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver Aug 17 13:42:20 master bash[19272]: msg = Message.from_io(io) Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 437, in from_io Aug 17 13:42:20 master bash[19272]: raise EOFError("couldnt load message header, " + e.args[0]) Aug 17 13:42:20 master bash[19272]: EOFError: couldnt load message header, expected 9 bytes, got 0 Aug 17 13:42:20 master bash[19272]: During handling of the above exception, another exception occurred: Aug 17 13:42:20 master bash[19272]: Traceback (most recent call last): Aug 17 13:42:20 master bash[19272]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1035, in _remote_connection Aug 17 13:42:20 master bash[19272]: yield (conn, connr) Aug 17 13:42:20 master bash[19272]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1131, in _run_cephadm Aug 17 13:42:20 master bash[19272]: stdin=script.encode('utf-8')) Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/remoto/process.py", line 209, in check Aug 17 13:42:20 master bash[19272]: 'Failed to execute command: %s' % ' '.join(command) Aug 17 13:42:20 master bash[19272]: RuntimeError: Failed to execute command: /usr/bin/python3 -u Aug 17 13:42:20 master bash[19272]: debug 2020-08-17T11:42:20.900+0000 7f2bbda3c700 -1 log_channel(cephadm) log [ERR] : Failed to execute command: /usr/bin/python3 -u Aug 17 13:42:20 master bash[19272]: Traceback (most recent call last): Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 432, in from_io Aug 17 13:42:20 master bash[19272]: header = io.read(9) # type 1, channel 4, payload 4 Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read Aug 17 13:42:20 master bash[19272]: raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf))) Aug 17 13:42:20 master bash[19272]: EOFError: expected 9 bytes, got 0 Aug 17 13:42:20 master bash[19272]: During handling of the above exception, another exception occurred: Aug 17 13:42:20 master bash[19272]: Traceback (most recent call last): Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/remoto/process.py", line 188, in check Aug 17 13:42:20 master bash[19272]: response = result.receive(timeout) Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive Aug 17 13:42:20 master bash[19272]: raise self._getremoteerror() or EOFError() Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver Aug 17 13:42:20 master bash[19272]: msg = Message.from_io(io) Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 437, in from_io Aug 17 13:42:20 master bash[19272]: raise EOFError("couldnt load message header, " + e.args[0]) Aug 17 13:42:20 master bash[19272]: EOFError: couldnt load message header, expected 9 bytes, got 0 Aug 17 13:42:20 master bash[19272]: During handling of the above exception, another exception occurred: Aug 17 13:42:20 master bash[19272]: Traceback (most recent call last): Aug 17 13:42:20 master bash[19272]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1035, in _remote_connection Aug 17 13:42:20 master bash[19272]: yield (conn, connr) Aug 17 13:42:20 master bash[19272]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1131, in _run_cephadm Aug 17 13:42:20 master bash[19272]: stdin=script.encode('utf-8')) Aug 17 13:42:20 master bash[19272]: File "/usr/lib/python3.6/site-packages/remoto/process.py", line 209, in check Aug 17 13:42:20 master bash[19272]: 'Failed to execute command: %s' % ' '.join(command) Aug 17 13:42:20 master bash[19272]: RuntimeError: Failed to execute command: /usr/bin/python3 -u Aug 17 13:42:21 master bash[19272]: Warning: Permanently added 'master' (ECDSA) to the list of known hosts.
execnet is again super helpful.
Fortunately, we were able to recover from this, as we're calling _reset_con() in that case.
Updated by Sebastian Wagner almost 4 years ago
- Related to Bug #38757: mgr/ssh orchestrator doesn't work added
Updated by Sebastian Wagner almost 4 years ago
- Subject changed from execnet: expected 9 bytes, got 0 to execnet: EOFError: couldnt load message header, expected 9 bytes, got 0
Updated by Sebastian Wagner almost 4 years ago
Updated by Sebastian Wagner almost 4 years ago
- Related to Cleanup #44676: cephadm: Replace execnet (and remoto) added
Updated by Sebastian Wagner almost 4 years ago
- Related to Bug #46764: cephadm (ceph orch apply) sometimes gets "stuck" and cannot deploy any OSDs added
Updated by Nathan Cutler almost 4 years ago
- Affected Versions v15.2.5 added
Seems to:
(1) happen in libvirt VMs running on slower hardware (e.g. nested virt)
(2) be a recent regression
Updated by Nathan Cutler almost 4 years ago
- Related to deleted (Bug #46764: cephadm (ceph orch apply) sometimes gets "stuck" and cannot deploy any OSDs)
Updated by Nathan Cutler almost 4 years ago
- Has duplicate Bug #46764: cephadm (ceph orch apply) sometimes gets "stuck" and cannot deploy any OSDs added
Updated by Nathan Cutler almost 4 years ago
Note: this problem is known to arise (only on machines with root filesystem on HDD) the first time "ceph orch apply" is run after "cephadm bootstrap" completes.
I found it's enough to wait one minute, after "ceph bootstrap" command completes, and before issuing the "ceph orch apply" command to create OSDs, to make the problem go away.
Could it be that this is just a consequence of cephadm being asynchronous? Is it possible that running "ceph orch apply" immediately after "cephadm bootstrap" returns catches mgr/cephadm unprepared - maybe it is still working through its startup routine, for example?
Updated by Nathan Cutler almost 4 years ago
The moral of the story is: wait for the bootstrap MON and MGR to appear in "cephadm ls" before proceeding with "ceph orch apply".
Updated by Sebastian Wagner over 3 years ago
- Status changed from New to Can't reproduce