Actions
Bug #48773
openqa: scrub does not complete
Status:
In Progress
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds, qa-failure, scrub, task(medium)
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2021-01-06T01:58:53.086 INFO:tasks.fwd_scrub.fs.[cephfs]:scrub status for tag:4a5ba7a2-3f38-424b-aa70-9c2bb711d766 - {'path': '/', 'tag': '4a5ba7a2-3f38-424b-aa70-9c2bb711d766', 'options': 'recursive,force'} 2021-01-06T01:58:53.087 ERROR:tasks.fwd_scrub.fs.[cephfs]:exception: Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20210105.221014/qa/tasks/fwd_scrub.py", line 40, in _run self.do_scrub() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20210105.221014/qa/tasks/fwd_scrub.py", line 57, in do_scrub self._scrub() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20210105.221014/qa/tasks/fwd_scrub.py", line 76, in _scrub return self._wait_until_scrub_complete(tag) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20210105.221014/qa/tasks/fwd_scrub.py", line 81, in _wait_until_scrub_complete while proceed(): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 133, in __call__ raise MaxWhileTries(error_msg) teuthology.exceptions.MaxWhileTries: reached maximum tries (30) after waiting for 900 seconds 2021-01-06T01:58:57.267 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.fs.[cephfs] failed 2021-01-06T01:58:57.268 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
From: /ceph/teuthology-archive/pdonnell-2021-01-06_00:07:44-fs:workload-wip-pdonnell-testing-20210105.221014-distro-basic-smithi/5758061/teuthology.log
same failure causes all of these failures:
Failure: Command failed on smithi071 with status 1: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp' 1 jobs: ['5758061'] suites: ['clusters/1a5s-mds-1c-client-3node', 'conf/{client', 'distro/{centos_8}', 'fs:workload/{begin', 'mds', 'mon', 'mount/kclient/{mount', 'ms-die-on-skipped}}', 'objectstore-ec/bluestore-bitmap', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{distro/stock/{k-stock', 'overrides/{frag_enable', 'ranks/5', 'rhel_8}', 'scrub/yes', 'session_timeout', 'tasks/{0-check-counter', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'workunit/suites/blogbench}}'] Crash: Command failed (workunit test fs/misc/multiple_rsync.sh) on smithi192 with status 23: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=d0ed162b51928c50f20cee111f8292828eda755e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/misc/multiple_rsync.sh' ceph version 16.0.0-8719-gd0ed162b (d0ed162b51928c50f20cee111f8292828eda755e) pacific (dev) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f6c914c5980] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19c) [0x7f6c926463ce] 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6c92646558] 6: ceph-mon(+0x7a1d02) [0x558dad62fd02] 7: (Monitor::~Monitor()+0x9) [0x558dad62fd49] 8: main() 9: __libc_start_main() 10: _start() 2 jobs: ['5758073', '5758087'] suites intersection: ['clusters/1a5s-mds-1c-client-3node', 'conf/{client', 'fs:workload/{begin', 'mds', 'mon', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{frag_enable', 'scrub/yes', 'session_timeout', 'tasks/{0-check-counter', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'workunit/fs/misc}}'] suites union: ['clusters/1a5s-mds-1c-client-3node', 'conf/{client', 'distro/{rhel_8}', 'distro/{ubuntu_latest}', 'fs:workload/{begin', 'k-testing}', 'mds', 'mon', 'mount/fuse', 'mount/kclient/{mount', 'ms-die-on-skipped}}', 'objectstore-ec/bluestore-comp-ec-root', 'objectstore-ec/bluestore-ec-root', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{distro/testing/{flavor/centos_latest', 'overrides/{frag_enable', 'ranks/3', 'ranks/5', 'scrub/yes', 'session_timeout', 'tasks/{0-check-counter', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'workunit/fs/misc}}'] Timeout 3h running clone.client.0/qa/workunits/fs/misc/multiple_rsync.sh 1 jobs: ['5758031'] suites: ['clusters/1a5s-mds-1c-client-3node', 'conf/{client', 'distro/{rhel_8}', 'fs:workload/{begin', 'k-testing}', 'mds', 'mon', 'mount/kclient/{mount', 'ms-die-on-skipped}}', 'objectstore-ec/bluestore-ec-root', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{distro/testing/{flavor/ubuntu_latest', 'overrides/{frag_enable', 'ranks/5', 'scrub/yes', 'session_timeout', 'tasks/{0-check-counter', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'workunit/fs/misc}}'] Failure: Command failed (workunit test fs/misc/multiple_rsync.sh) on smithi204 with status 23: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=d0ed162b51928c50f20cee111f8292828eda755e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/misc/multiple_rsync.sh' 1 jobs: ['5758003'] suites: ['clusters/1a5s-mds-1c-client-3node', 'conf/{client', 'distro/{rhel_8}', 'fs:workload/{begin', 'mds', 'mon', 'mount/fuse', 'objectstore-ec/bluestore-comp', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{frag_enable', 'ranks/3', 'scrub/yes', 'session_timeout', 'tasks/{0-check-counter', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'workunit/fs/misc}}']
This seems to only happen with scrubs involving multiple MDS.
Actions