Bug #62848: qa: fail_fs upgrade scenario hanging - CephFS - Ceph

Actions

Copy link

Bug #62848

closed

qa: fail_fs upgrade scenario hanging

Added by Patrick Donnelly 8 months ago. Updated 7 months ago.

Status:

Duplicate

Priority:

Urgent

Assignee:

Patrick Donnelly

Category:

Testing

Target version:

Ceph - v19.0.0

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

qa-suite

Labels (FS):

qa, qa-failure

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2023-09-12T17:30:00.275 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] : Health detail: HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon; 1 filesystem is offline
2023-09-12T17:30:00.276 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] : [WRN] FS_DEGRADED: 1 filesystem is degraded
2023-09-12T17:30:00.276 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] :     fs cephfs is degraded
2023-09-12T17:30:00.276 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] : [WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon
2023-09-12T17:30:00.276 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] :     fs cephfs has 2 failed mdss
2023-09-12T17:30:00.277 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] : [ERR] MDS_ALL_DOWN: 1 filesystem is offline
2023-09-12T17:30:00.277 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] :     fs cephfs is offline because no MDS is active for it.
2023-09-12T17:30:00.775 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]: Health detail: HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon; 1 filesystem is offline
2023-09-12T17:30:00.776 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]: [WRN] FS_DEGRADED: 1 filesystem is degraded
2023-09-12T17:30:00.776 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]:     fs cephfs is degraded
2023-09-12T17:30:00.776 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]: [WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon
2023-09-12T17:30:00.776 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]:     fs cephfs has 2 failed mdss
2023-09-12T17:30:00.777 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]: [ERR] MDS_ALL_DOWN: 1 filesystem is offline
2023-09-12T17:30:00.777 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]:     fs cephfs is offline because no MDS is active for it.
2023-09-12T17:30:00.883 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]: Health detail: HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon; 1 filesystem is offline
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]: [WRN] FS_DEGRADED: 1 filesystem is degraded
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]:     fs cephfs is degraded
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]: [WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]:     fs cephfs has 2 failed mdss
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]: [ERR] MDS_ALL_DOWN: 1 filesystem is offline
2023-09-12T17:30:00.885 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]:     fs cephfs is offline because no MDS is active for it.
...
2023-09-12T17:32:29.424 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 747, in recv_stderr
    out = self.in_stderr_buffer.read(nbytes, self.timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/buffered_pipe.py", line 164, in read
    raise PipeTimeout()
paramiko.buffered_pipe.PipeTimeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/gevent/greenlet.py", line 906, in gevent._gevent_cgreenlet.Greenlet.run
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 323, in copy_file_to
    copy_to_log(src, logger, capture=stream, quiet=quiet)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 276, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/file.py", line 125, in __next__
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/file.py", line 291, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 1376, in _read
    return self.channel.recv_stderr(size)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 749, in recv_stderr
    raise socket.timeout()
socket.timeout
2023-09-12T17:32:29.434 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 699, in recv
    out = self.in_buffer.read(nbytes, self.timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/buffered_pipe.py", line 164, in read
    raise PipeTimeout()
paramiko.buffered_pipe.PipeTimeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/gevent/greenlet.py", line 906, in gevent._gevent_cgreenlet.Greenlet.run
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 323, in copy_file_to
    copy_to_log(src, logger, capture=stream, quiet=quiet)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 276, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/file.py", line 125, in __next__
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/file.py", line 291, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 1361, in _read
    return self.channel.recv(size)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 701, in recv
    raise socket.timeout()
socket.timeout

From: /teuthology/pdonnell-2023-09-12_14:07:50-fs-wip-batrick-testing-20230912.122437-distro-default-smithi/7395159/teuthology.log

and others. Probably fallout from: 2b839838f70e9bcd31568013106aa7b5d2313bbe

Related issues 1 (0 open — 1 closed)