Bug #36078
closedmds: 9 active MDS cluster stuck during fsstress
0%
Description
2018-09-18T23:17:57.267 INFO:tasks.workunit:Running workunits matching suites/fsstress.sh on client.1... 2018-09-18T23:17:57.267 INFO:tasks.workunit:Running workunit suites/fsstress.sh... 2018-09-18T23:17:57.267 INFO:teuthology.orchestra.run.smithi205:Running (workunit test suites/fsstress.sh): 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=wip-pdonnell-testing-20180918.221315 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.1/qa/workunits/suites/fsstress.sh' 2018-09-18T23:17:57.316 INFO:tasks.workunit:Running workunits matching suites/fsstress.sh on client.0... 2018-09-18T23:17:57.316 INFO:tasks.workunit:Running workunit suites/fsstress.sh... 2018-09-18T23:17:57.316 INFO:teuthology.orchestra.run.smithi205:Running (workunit test suites/fsstress.sh): 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=wip-pdonnell-testing-20180918.221315 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/fsstress.sh' 2018-09-18T23:17:57.920 INFO:tasks.workunit.client.0.smithi205.stderr:+ mkdir -p fsstress 2018-09-18T23:17:57.935 INFO:tasks.workunit.client.1.smithi205.stderr:+ mkdir -p fsstress 2018-09-18T23:17:57.977 INFO:tasks.workunit.client.0.smithi205.stderr:+ pushd fsstress 2018-09-18T23:17:57.977 INFO:tasks.workunit.client.0.smithi205.stderr:+ wget -q -O ltp-full.tgz http://download.ceph.com/qa/ltp-full-20091231.tgz 2018-09-18T23:17:57.977 INFO:tasks.workunit.client.0.smithi205.stdout:~/cephtest/mnt.0/client.0/tmp/fsstress ~/cephtest/mnt.0/client.0/tmp 2018-09-18T23:17:58.020 INFO:tasks.workunit.client.1.smithi205.stderr:+ pushd fsstress 2018-09-18T23:17:58.021 INFO:tasks.workunit.client.1.smithi205.stderr:+ wget -q -O ltp-full.tgz http://download.ceph.com/qa/ltp-full-20091231.tgz 2018-09-18T23:17:58.021 INFO:tasks.workunit.client.1.smithi205.stdout:~/cephtest/mnt.1/client.1/tmp/fsstress ~/cephtest/mnt.1/client.1/tmp 2018-09-18T23:17:59.370 INFO:tasks.workunit.client.1.smithi205.stderr:+ tar xzf ltp-full.tgz 2018-09-18T23:18:01.751 INFO:tasks.workunit.client.0.smithi205.stderr:+ tar xzf ltp-full.tgz 2018-09-18T23:18:06.681 INFO:teuthology.orchestra.run.smithi150:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-09-18T23:18:06.685 INFO:teuthology.orchestra.run.smithi173:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-09-18T23:18:06.688 INFO:teuthology.orchestra.run.smithi205:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-09-18T23:18:12.346 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:12.214 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:12.368 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:12.215 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:12.401 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:12.215 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:12.402 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:12.215 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:33.258 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:32.875 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:33.259 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:32.875 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:33.259 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:32.876 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:33.260 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:32.876 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:36.856 INFO:teuthology.orchestra.run.smithi150:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-09-18T23:18:36.860 INFO:teuthology.orchestra.run.smithi173:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-09-18T23:18:36.863 INFO:teuthology.orchestra.run.smithi205:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-09-18T23:18:43.778 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:43.584 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:43.786 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:43.584 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:43.791 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:43.584 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:43.811 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:43.584 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:54.783 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:54.812 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:54.814 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:54.814 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:18:54.815 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:18:54.640 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:19:04.753 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:19:04.754 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:19:04.753 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:19:04.754 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:19:04.753 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:19:04.754 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:19:04.771 INFO:tasks.ceph.mds.a.smithi173.stderr:2018-09-18 23:19:04.754 18b10700 -1 mds.0.bal balancer runs too long 2018-09-18T23:19:07.025 INFO:teuthology.orchestra.run.smithi150:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-09-18T23:19:07.030 INFO:teuthology.orchestra.run.smithi173:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-09-18T23:19:07.034 INFO:teuthology.orchestra.run.smithi205:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' ...
From: /ceph/teuthology-archive/pdonnell-2018-09-18_22:20:57-multimds:verify-master-distro-basic-smithi/3039349/teuthology.log
Note! I killed fsstress/rm once it was clear they were stuck.
This is with fsstress modified to work properly with parallel instances: https://github.com/ceph/ceph/pull/24157
Updated by Zheng Yan over 5 years ago
- Status changed from New to Need More Info
I don't see any stuck client request in any mds. The "balancer runs too long" warning is because the mds ran too slowly (ether machine is too slow or debug level is too high). I added code to limit balancer to run 0.1 second at most. But the log show that mds took 0.01 second to check popularity of a 50 entries directory.
Updated by Patrick Donnelly over 5 years ago
Also in Luminous: /ceph/teuthology-archive/yuriw-2018-10-05_22:19:38-multimds-wip-yuri4-testing-2018-10-05-2015-luminous-testing-basic-smithi/3105585/teuthology.log
Updated by Patrick Donnelly about 5 years ago
- Target version changed from v14.0.0 to v15.0.0
Updated by Patrick Donnelly over 4 years ago
- Status changed from Need More Info to Can't reproduce
- Target version deleted (
v15.0.0) - Backport deleted (
mimic,luminous)