Project

General

Profile

Actions

Bug #54411

closed

mds_upgrade_sequence: "overall HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available; 33 daemons have recently crashed" during suites/fsstress.sh

Added by Laura Flores about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2022-02-21_15:48:20-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6698603

2022-02-21T22:00:01.210 INFO:journalctl@ceph.mon.smithi119.smithi119.stdout:Feb 21 22:00:00 smithi119 conmon[66118]: cluster 2022-02-21T22:00:00.000143+0000 mon.smithi107 (mon.0) 2988 : cluster [WRN] overall HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available; 33 daemons have recently crashed

...

2022-02-21T22:07:07.894 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.1...
2022-02-21T22:07:07.895 DEBUG:teuthology.orchestra.run.smithi119:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.1 /home/ubuntu/cephtest/clone.client.1
2022-02-21T22:07:07.936 INFO:journalctl@ceph.mon.smithi107.smithi107.stdout:Feb 21 22:07:07 smithi107 conmon[101214]: cluster 2022-02-21T22:07:05.802071+0000 mgr.smithi119.mindyy (mgr.24491) 6642
2022-02-21T22:07:07.936 INFO:journalctl@ceph.mon.smithi107.smithi107.stdout:Feb 21 22:07:07 smithi107 conmon[101214]:  : cluster [DBG] pgmap v5672: 65 pgs: 65 active+clean; 2.5 GiB data, 7.5 GiB used, 529 GiB / 536 GiB avail
2022-02-21T22:07:07.940 INFO:journalctl@ceph.mon.smithi119.smithi119.stdout:Feb 21 22:07:07 smithi119 conmon[66118]: cluster 2022-02-21T22:07:05.802071+0000 mgr.smithi119.mindyy (mgr.24491) 6642 : cluster [DBG] pgmap v5672: 65 pgs: 65 active+clean; 2.5 GiB data, 7.5 GiB used, 529 GiB / 536 GiB avail
2022-02-21T22:07:08.150 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/run_tasks.py", line 91, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/task/parallel.py", line 56, in task
    p.spawn(_run_spawned, ctx, confg, taskname)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/task/parallel.py", line 64, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/task/sequential.py", line 47, in task
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=confg)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9f91d3caa3f16637a5668f2b678fb3a44b6977ba/qa/tasks/workunit.py", line 147, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9f91d3caa3f16637a5668f2b678fb3a44b6977ba/qa/tasks/workunit.py", line 297, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9f91d3caa3f16637a5668f2b678fb3a44b6977ba/qa/tasks/workunit.py", line 426, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/orchestra/remote.py", line 509, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed (workunit test suites/fsstress.sh) on smithi119 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=9f91d3caa3f16637a5668f2b678fb3a44b6977ba TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 CEPH_MNT=/home/ubuntu/cephtest/mnt.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.1/qa/workunits/suites/fsstress.sh'
2022-02-21T22:07:08.279 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=de497daf715d4cd5840a23b715fe2354


Related issues 3 (0 open3 closed)

Related to CephFS - Bug #54459: fs:upgrade fails with "hit max job timeout"RejectedVenky Shankar

Actions
Copied to CephFS - Backport #55447: quincy: mds_upgrade_sequence: "overall HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available; 33 daemons have recently crashed" during suites/fsstress.shResolvedXiubo LiActions
Copied to CephFS - Backport #55449: pacific: mds_upgrade_sequence: "overall HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available; 33 daemons have recently crashed" during suites/fsstress.shResolvedXiubo LiActions
Actions

Also available in: Atom PDF