Project

General

Profile

Actions

Bug #62848

closed

qa: fail_fs upgrade scenario hanging

Added by Patrick Donnelly 8 months ago. Updated 7 months ago.

Status:
Duplicate
Priority:
Urgent
Category:
Testing
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2023-09-12T17:30:00.275 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] : Health detail: HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon; 1 filesystem is offline
2023-09-12T17:30:00.276 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] : [WRN] FS_DEGRADED: 1 filesystem is degraded
2023-09-12T17:30:00.276 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] :     fs cephfs is degraded
2023-09-12T17:30:00.276 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] : [WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon
2023-09-12T17:30:00.276 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] :     fs cephfs has 2 failed mdss
2023-09-12T17:30:00.277 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] : [ERR] MDS_ALL_DOWN: 1 filesystem is offline
2023-09-12T17:30:00.277 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-4c41840e-518f-11ee-9ab7-7b867c8bd7da-mon-smithi173[187305]: 2023-09-12T17:29:59.999+0000 7fec1cd83700 -1 log_channel(cluster) log [ERR] :     fs cephfs is offline because no MDS is active for it.
2023-09-12T17:30:00.775 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]: Health detail: HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon; 1 filesystem is offline
2023-09-12T17:30:00.776 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]: [WRN] FS_DEGRADED: 1 filesystem is degraded
2023-09-12T17:30:00.776 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]:     fs cephfs is degraded
2023-09-12T17:30:00.776 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]: [WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon
2023-09-12T17:30:00.776 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]:     fs cephfs has 2 failed mdss
2023-09-12T17:30:00.777 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]: [ERR] MDS_ALL_DOWN: 1 filesystem is offline
2023-09-12T17:30:00.777 INFO:journalctl@ceph.mon.smithi173.smithi173.stdout:Sep 12 17:30:00 smithi173 ceph-mon[187328]:     fs cephfs is offline because no MDS is active for it.
2023-09-12T17:30:00.883 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]: Health detail: HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon; 1 filesystem is offline
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]: [WRN] FS_DEGRADED: 1 filesystem is degraded
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]:     fs cephfs is degraded
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]: [WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]:     fs cephfs has 2 failed mdss
2023-09-12T17:30:00.884 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]: [ERR] MDS_ALL_DOWN: 1 filesystem is offline
2023-09-12T17:30:00.885 INFO:journalctl@ceph.mon.smithi204.smithi204.stdout:Sep 12 17:30:00 smithi204 ceph-mon[159017]:     fs cephfs is offline because no MDS is active for it.
...
2023-09-12T17:32:29.424 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 747, in recv_stderr
    out = self.in_stderr_buffer.read(nbytes, self.timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/buffered_pipe.py", line 164, in read
    raise PipeTimeout()
paramiko.buffered_pipe.PipeTimeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/gevent/greenlet.py", line 906, in gevent._gevent_cgreenlet.Greenlet.run
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 323, in copy_file_to
    copy_to_log(src, logger, capture=stream, quiet=quiet)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 276, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/file.py", line 125, in __next__
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/file.py", line 291, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 1376, in _read
    return self.channel.recv_stderr(size)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 749, in recv_stderr
    raise socket.timeout()
socket.timeout
2023-09-12T17:32:29.434 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 699, in recv
    out = self.in_buffer.read(nbytes, self.timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/buffered_pipe.py", line 164, in read
    raise PipeTimeout()
paramiko.buffered_pipe.PipeTimeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/gevent/greenlet.py", line 906, in gevent._gevent_cgreenlet.Greenlet.run
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 323, in copy_file_to
    copy_to_log(src, logger, capture=stream, quiet=quiet)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 276, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/file.py", line 125, in __next__
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/file.py", line 291, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 1361, in _read
    return self.channel.recv(size)
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/virtualenv/lib/python3.8/site-packages/paramiko/channel.py", line 701, in recv
    raise socket.timeout()
socket.timeout

From: /teuthology/pdonnell-2023-09-12_14:07:50-fs-wip-batrick-testing-20230912.122437-distro-default-smithi/7395159/teuthology.log

and others. Probably fallout from: 2b839838f70e9bcd31568013106aa7b5d2313bbe


Related issues 1 (0 open1 closed)

Is duplicate of CephFS - Bug #62682: mon: no mdsmap broadcast after "fs set joinable" is set to trueResolvedPatrick Donnelly

Actions
Actions #1

Updated by Venky Shankar 8 months ago

  • Related to Bug #62682: mon: no mdsmap broadcast after "fs set joinable" is set to true added
Actions #2

Updated by Venky Shankar 8 months ago

  • Status changed from New to Triaged
  • Assignee set to Patrick Donnelly
Actions #3

Updated by Patrick Donnelly 7 months ago

  • Related to deleted (Bug #62682: mon: no mdsmap broadcast after "fs set joinable" is set to true)
Actions #4

Updated by Patrick Donnelly 7 months ago

  • Is duplicate of Bug #62682: mon: no mdsmap broadcast after "fs set joinable" is set to true added
Actions #5

Updated by Patrick Donnelly 7 months ago

  • Status changed from Triaged to Duplicate
Actions

Also available in: Atom PDF