Project

General

Profile

Bug #48933

cephadm: EOFError: couldnt load message header, expected 9 bytes, got 0

Added by Gunther Heinrich about 1 month ago. Updated 21 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Found several uncatched exceptions, if I remember correctly, one host was rebooting or updating at that time.

debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02] Traceback (most recent call last):
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] : Traceback (most recent call last):
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 432, in from_io
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 432, in from_io
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]     header = io.read(9)  # type 1, channel 4, payload 4
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :     header = io.read(9)  # type 1, channel 4, payload 4
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]     raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :     raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02] EOFError: expected 9 bytes, got 0
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] : EOFError: expected 9 bytes, got 0
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02] During handling of the above exception, another exception occurred:
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] : During handling of the above exception, another exception occurred:
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02] Traceback (most recent call last):
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] : Traceback (most recent call last):
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]   File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :   File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]     response = result.receive(timeout)
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :     response = result.receive(timeout)
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]     raise self._getremoteerror() or EOFError()
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :     raise self._getremoteerror() or EOFError()
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]     msg = Message.from_io(io)
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :     msg = Message.from_io(io)
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 437, in from_io
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 437, in from_io
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]     raise EOFError("couldnt load message header, " + e.args[0])
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :     raise EOFError("couldnt load message header, " + e.args[0])
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02] EOFError: couldnt load message header, expected 9 bytes, got 0
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] : EOFError: couldnt load message header, expected 9 bytes, got 0
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700  0 [cephadm ERROR root@iz-ceph-v1-osd-02]
debug 2021-01-18T13:06:36.862+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] :
debug 2021-01-18T13:06:36.938+0000 7f6e0fcd9700  0 [cephadm ERROR root] Failed to execute command: /usr/bin/python3 -u
Traceback (most recent call last):
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 432, in from_io
    header = io.read(9)  # type 1, channel 4, payload 4
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
    raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
EOFError: expected 9 bytes, got 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check
    response = result.receive(timeout)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive
    raise self._getremoteerror() or EOFError()
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver
    msg = Message.from_io(io)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 437, in from_io
    raise EOFError("couldnt load message header, " + e.args[0])
EOFError: couldnt load message header, expected 9 bytes, got 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1012, in _remote_connection
    yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1128, in _run_cephadm
    stdin=script.encode('utf-8'))
  File "/lib/python3.6/site-packages/remoto/process.py", line 209, in check
    'Failed to execute command: %s' % ' '.join(command)
RuntimeError: Failed to execute command: /usr/bin/python3 -u
debug 2021-01-18T13:06:36.938+0000 7f6e0fcd9700 -1 log_channel(cephadm) log [ERR] : Failed to execute command: /usr/bin/python3 -u
Traceback (most recent call last):
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 432, in from_io
    header = io.read(9)  # type 1, channel 4, payload 4
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
    raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
EOFError: expected 9 bytes, got 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check
    response = result.receive(timeout)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive
    raise self._getremoteerror() or EOFError()
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver
    msg = Message.from_io(io)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 437, in from_io
    raise EOFError("couldnt load message header, " + e.args[0])
EOFError: couldnt load message header, expected 9 bytes, got 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1012, in _remote_connection
    yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1128, in _run_cephadm
    stdin=script.encode('utf-8'))
  File "/lib/python3.6/site-packages/remoto/process.py", line 209, in check
    'Failed to execute command: %s' % ' '.join(command)
RuntimeError: Failed to execute command: /usr/bin/python3 -u

History

#1 Updated by Sebastian Wagner 23 days ago

Do you have installed /usr/bin/python3 on the remote host?

#2 Updated by Gunther Heinrich 22 days ago

Yes, "python3 -V" gives me "Python 3.8.5".

Am I correct to assume that the exception refers to python inside a container? Because on the node itself I only find lib/python3, lib/python3.8 and lib/python3.9 whereas the two latest versions were installed in Nov 2020.

#3 Updated by Sebastian Wagner 22 days ago

Gunther Heinrich wrote:

Yes, "python3 -V" gives me "Python 3.8.5".

Hm.

Am I correct to assume that the exception refers to python inside a container?

correct.

Does restarting the mgr (ceph mgr fail ...) help?

#4 Updated by Gunther Heinrich 21 days ago

Unfortunately I already cannot go back to do that. It was no big issue from the beginning since as far as I remember the exceptions didn't last because the cluster was updating/rebooting at that time (in relation to the daemon startup failure tests I did). So I guess that a restart helped.

Also available in: Atom PDF