Project

General

Profile

Actions

Bug #36078

closed

mds: 9 active MDS cluster stuck during fsstress

Added by Patrick Donnelly over 5 years ago. Updated over 4 years ago.

Status:
Can't reproduce
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-09-18T23:17:57.267 INFO:tasks.workunit:Running workunits matching suites/fsstress.sh on client.1...
2018-09-18T23:17:57.267 INFO:tasks.workunit:Running workunit suites/fsstress.sh...
2018-09-18T23:17:57.267 INFO:teuthology.orchestra.run.smithi205:Running (workunit test suites/fsstress.sh): 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=wip-pdonnell-testing-20180918.221315 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.1/qa/workunits/suites/fsstress.sh'
2018-09-18T23:17:57.316 INFO:tasks.workunit:Running workunits matching suites/fsstress.sh on client.0...
2018-09-18T23:17:57.316 INFO:tasks.workunit:Running workunit suites/fsstress.sh...
2018-09-18T23:17:57.316 INFO:teuthology.orchestra.run.smithi205:Running (workunit test suites/fsstress.sh): 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=wip-pdonnell-testing-20180918.221315 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/fsstress.sh'
2018-09-18T23:17:57.920 INFO:tasks.workunit.client.0.smithi205.stderr:+ mkdir -p fsstress
2018-09-18T23:17:57.935 INFO:tasks.workunit.client.1.smithi205.stderr:+ mkdir -p fsstress
2018-09-18T23:17:57.977 INFO:tasks.workunit.client.0.smithi205.stderr:+ pushd fsstress
2018-09-18T23:17:57.977 INFO:tasks.workunit.client.0.smithi205.stderr:+ wget -q -O ltp-full.tgz http://download.ceph.com/qa/ltp-full-20091231.tgz
2018-09-18T23:17:57.977 INFO:tasks.workunit.client.0.smithi205.stdout:~/cephtest/mnt.0/client.0/tmp/fsstress ~/cephtest/mnt.0/client.0/tmp
2018-09-18T23:17:58.020 INFO:tasks.workunit.client.1.smithi205.stderr:+ pushd fsstress
2018-09-18T23:17:58.021 INFO:tasks.workunit.client.1.smithi205.stderr:+ wget -q -O ltp-full.tgz http://download.ceph.com/qa/ltp-full-20091231.tgz
2018-09-18T23:17:58.021 INFO:tasks.workunit.client.1.smithi205.stdout:~/cephtest/mnt.1/client.1/tmp/fsstress ~/cephtest/mnt.1/client.1/tmp
2018-09-18T23:17:59.370 INFO:tasks.workunit.client.1.smithi205.stderr:+ tar xzf ltp-full.tgz
2018-09-18T23:18:01.751 INFO:tasks.workunit.client.0.smithi205.stderr:+ tar xzf ltp-full.tgz
2018-09-18T23:18:06.681 INFO:teuthology.orchestra.run.smithi150:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-09-18T23:18:06.685 INFO:teuthology.orchestra.run.smithi173:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-09-18T23:18:06.688 INFO:teuthology.orchestra.run.smithi205:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-09-18T23:18:12.346 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:12.214 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:12.368 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:12.215 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:12.401 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:12.215 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:12.402 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:12.215 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:33.258 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:32.875 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:33.259 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:32.875 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:33.259 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:32.876 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:33.260 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:32.876 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:36.856 INFO:teuthology.orchestra.run.smithi150:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-09-18T23:18:36.860 INFO:teuthology.orchestra.run.smithi173:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-09-18T23:18:36.863 INFO:teuthology.orchestra.run.smithi205:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-09-18T23:18:43.778 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:43.584 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:43.786 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:43.584 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:43.791 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:43.584 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:43.811 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:43.584 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:54.783 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:54.812 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:54.814 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:54.814 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:18:54.815 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:19:04.753 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:19:04.754 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:19:04.753 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:19:04.754 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:19:04.753 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:19:04.754 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:19:04.771 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:19:04.754 18b10700 -1 mds.0.bal  balancer runs too long
2018-09-18T23:19:07.025 INFO:teuthology.orchestra.run.smithi150:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-09-18T23:19:07.030 INFO:teuthology.orchestra.run.smithi173:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2018-09-18T23:19:07.034 INFO:teuthology.orchestra.run.smithi205:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
...

From: /ceph/teuthology-archive/pdonnell-2018-09-18_22:20:57-multimds:verify-master-distro-basic-smithi/3039349/teuthology.log

Note! I killed fsstress/rm once it was clear they were stuck.

This is with fsstress modified to work properly with parallel instances: https://github.com/ceph/ceph/pull/24157

Actions

Also available in: Atom PDF