Bug #38238
closedrados/test.sh: api_aio_pp doesn't seem to start
0%
Description
2019-02-08T01:31:44.931 INFO:tasks.workunit.client.0.smithi092.stdout:test api_aio_pp on pid 60289
and then no output from that test in the log. the test eventually times out with
2019-02-08T04:54:49.065 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 32, in nested yield vars File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/ceph.py", line 1933, in task yield File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/workunit.py", line 123, in task timeout=timeout,cleanup=cleanup) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/workunit.py", line 409, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 194, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 435, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status node=self.hostname, label=self.label CommandFailedError: Command failed (workunit test rados/test.sh) on smithi092 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=ff23e1ed2868ffdfbd9f201f6ecbb8426b3aa0d7 TESTDIR="/home/ubuntu/cephtest" CEPH _ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'
/a/sage-2019-02-07_22:38:18-rados-master-distro-basic-smithi/3561540
Updated by Greg Farnum about 5 years ago
The referenced log doesn't seem to be this bug, as it got stuck on a crashed OSD...
Updated by Greg Farnum about 5 years ago
Ah, /a/sage-2019-02-07_22:38:18-rados-master-distro-basic-smithi/3561540 appears to be it
Updated by Sage Weil about 5 years ago
- Description updated (diff)
another instance:
/a/sage-2019-02-21_21:52:17-rados-wip-sage3-testing-2019-02-21-1359-distro-basic-smithi/3622638
description: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml 2-recovery-overrides/{default.yaml} backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/crush-compat.yaml msgr-failures/osd-delay.yaml msgr/random.yaml objectstore/bluestore-stupid.yaml rados.yaml supported-random-distro$/{ubuntu_16.04.yaml} thrashers/default.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}
Updated by Sage Weil about 5 years ago
ceph-client.admin.19974.log.gz is aio_pp
it starts, gets a few tests in, then the log stops unexpectedly...
... 2019-02-21 23:36:36.285 7f6ea4a56700 10 client.4824.objecter _maybe_request_map subscribing (onetime) to next osd map 2019-02-21 23:36:36.285 7f6ea4a56700 1 -- 172.21.15.198:0/55569580 --> v1:172.21.15.181:6800/17224 -- ping magic: 0 v1 -- 0x7f6e80001ac0 con 0x7f6e880378b0 2019-02-21 23:36:37.161 7f6e97fff700 1 -- 172.21.15.198:0/55569580 <== mon.0 v1:172.21.15.181:6789/0 33 ==== osd_map(77..77 src has 1..77) v4 ==== 2376+0+0 (1343277661 0 0) 0x7f6ea0001a20 con 0x1a43470 2019-02-21 23:36:37.161 7f6e97fff700 10 client.4824.objecter ms_dispatch 0xf140d0 osd_map(77..77 src has 1..77) v4 2019-02-21 23:36:37.161 7f6e97fff700 3 client.4824.objecter handle_osd_map got epochs [77,77] > 76 2019-02-21 23:36:37.161 7f6e97fff700 3 client.4824.objecter handle_osd_map decoding incremental epoch 77 2019-02-21 23:36:37.161 7f6e97fff700 10 client.4824.objecter checking op 8 2019-02-21 23:36:37.161 7f6e97fff700 20 client.4824.objecter _calc_target epoch 77 base foo5 @37 precalc_pgid 0 pgid 0.0 is_write 2019-02-21 23:36:37.161 7f6e97fff700 20 client.4824.objecter _calc_target target foo5 @37 -> pgid 37.8dd35e86 2019-02-21 23:36:37.161 7f6e97fff700 10 client.4824.objecter _maybe_request_map subscribing (onetime) to next osd map 2019-02-21 23:36:37.161 7f6e97fff700 1 -- 172.21.15.198:0/55569580 --> v1:172.21.15.181:6789/0 -- mon_subscribe({osdmap=78}) v3 -- 0x7f6e8802a600 con 0x1a43470 2019-02-21 23:36:37.161 7f6e97fff700 20 client.4824.objecter dump_active .. 0 homeless 2019-02-21 23:36:37.161 7f6e97fff700 20 client.4824.objecter 8 37.8dd35e86 osd.5 foo5 [writefull 0~4096] 2019-02-21 23:36:37.381 7f6ea5a58700 0 -- 172.21.15.198:0/55569580 >> v1:172.21.15.181:6789/0 conn(0x1a43470 legacy=0x1a42ca0 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until injecting socket failure 2019-02-21 23:36:37.381 7f6ea5a58700 1 -- 172.21.15.198:0/55569580 >> v1:172.21.15.181:6789/0 conn(0x1a43470 legacy=0x1a42ca0 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 19 2019-02-21 23:36:37.381 7f6ea5a58700 1 -- 172.21.15.198:0/55569580 >> v1:172.21.15.181:6789/0 conn(0x1a43470 legacy=0x1a42ca0 :-1 s=STATE_
Updated by Sage Weil about 5 years ago
- Has duplicate Bug #38433: rados/test.sh timeout added
Updated by Sage Weil about 5 years ago
/a/rdias-2019-02-26_22:35:27-rados-wip-rdias2-testing-distro-basic-smithi/3642422
description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/normal.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/crush-compat.yaml msgr-failures/fastclose.yaml msgr/async-v2only.yaml objectstore/bluestore-stupid.yaml rados.yaml supported-random-distro$/{ubuntu_16.04.yaml} thrashers/careful.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}
Updated by Sage Weil about 5 years ago
/a/sage-2019-02-28_12:30:17-rados-wip-sage-testing-2019-02-27-1720-distro-basic-smithi/3649931
description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/normal.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/crush-compat.yaml msgr-failures/osd-delay.yaml msgr/async.yaml objectstore/bluestore-bitmap.yaml rados.yaml supported-random-distro$/{ubuntu_latest.yaml} thrashers/default.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}
Updated by Sage Weil about 5 years ago
/a/sage-2019-03-10_01:08:05-rados-master-distro-basic-smithi/3703837
description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/upmap.yaml msgr-failures/fastclose.yaml msgr/simple.yaml objectstore/bluestore-bitmap.yaml rados.yaml supported-random-distro$/{ubuntu_latest.yaml} thrashers/careful.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}
Updated by Sage Weil about 5 years ago
/a/sage-2019-03-13_02:19:41-rados-wip-sage3-testing-2019-03-12-1657-distro-basic-smithi/3715202
Updated by Neha Ojha about 5 years ago
- Backport set to nautilus
/a/yuriw-2019-04-05_00:28:05-rados-wip-yuri2-testing-2019-04-04-1953-nautilus-distro-basic-smithi/3811205/
Updated by David Zafman almost 5 years ago
- Backport changed from nautilus to nautilus, mimic
2019-07-10T07:22:00.844 INFO:tasks.workunit.client.0.smithi180.stdout: api_tier: [ PASSED ] 54 tests. 2019-07-10T07:22:00.846 INFO:tasks.workunit.client.0.smithi180.stderr:+ for t in '"${!pids[@]}"' 2019-07-10T07:22:00.846 INFO:tasks.workunit.client.0.smithi180.stderr:+ pid=12419 2019-07-10T07:22:00.846 INFO:tasks.workunit.client.0.smithi180.stderr:+ wait 12419 2019-07-10T07:22:00.846 INFO:tasks.workunit.client.0.smithi180.stderr:+ for t in '"${!pids[@]}"' 2019-07-10T07:22:00.847 INFO:tasks.workunit.client.0.smithi180.stderr:+ pid=12435 2019-07-10T07:22:00.847 INFO:tasks.workunit.client.0.smithi180.stderr:+ wait 12435 2019-07-10T07:22:00.847 INFO:tasks.workunit.client.0.smithi180.stderr:+ for t in '"${!pids[@]}"' 2019-07-10T07:22:00.848 INFO:tasks.workunit.client.0.smithi180.stderr:+ pid=12540 2019-07-10T07:22:00.848 INFO:tasks.workunit.client.0.smithi180.stderr:+ wait 12540 2019-07-10T07:22:00.849 INFO:tasks.workunit.client.0.smithi180.stderr:+ for t in '"${!pids[@]}"' 2019-07-10T07:22:00.849 INFO:tasks.workunit.client.0.smithi180.stderr:+ pid=12411 2019-07-10T07:22:00.849 INFO:tasks.workunit.client.0.smithi180.stderr:+ wait 12411 2019-07-10T13:09:25.293 INFO:tasks.workunit.client.0.smithi180.stderr:++ cleanup 2019-07-10T13:09:25.323 INFO:tasks.workunit.client.0.smithi180.stderr:++ pkill -P 12398 2019-07-10T13:09:25.359 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh: line 10: 12411 Terminated bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$ff.log | sed \"s/^/$r: /\"" 2019-07-10T13:09:25.359 INFO:tasks.workunit.client.0.smithi180.stderr:++ true 2019-07-10T13:09:25.392 DEBUG:teuthology.orchestra.run:got remote process result: 124 2019-07-10T13:09:25.392 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.0... 2019-07-10T13:09:25.392 INFO:teuthology.orchestra.run.smithi180:Running: 2019-07-10T13:09:25.393 INFO:teuthology.orchestra.run.smithi180:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2019-07-10T13:09:26.597 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task return task(**kwargs) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2019-07-08-2007-mimic/qa/tasks/workunit.py", line 123, in task timeout=timeout,cleanup=cleanup) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2019-07-08-2007-mimic/qa/tasks/workunit.py", line 409, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 205, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 437, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status node=self.hostname, label=self.label CommandFailedError: Command failed (workunit test rados/test.sh) on smithi180 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=f513862830ff116ae26fc1838f038ad2ed11d3a9 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'
Updated by Sage Weil almost 5 years ago
- Related to Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic added
Updated by Brad Hubbard over 4 years ago
2019-08-31 01:34:36,811.811 INFO:tasks.workunit.client.0.mira069.stdout:test api_io on pid 9498 ... 2019-08-31 01:52:22,196.196 INFO:tasks.workunit.client.0.mira069.stderr:+ for t in '"${!pids[@]}"' 2019-08-31 01:52:22,196.196 INFO:tasks.workunit.client.0.mira069.stderr:+ pid=9498 2019-08-31 01:52:22,196.196 INFO:tasks.workunit.client.0.mira069.stderr:+ wait 9498
Looks like I can reproduce this easily since pid 9498 has been stuck for several minutes.
$ grep crc /var/log/ceph/ceph-client.admin.9500.log|tail 2019-08-31 02:56:59.220 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.241 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.275 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.292 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.326 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.345 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.365 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.392 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.415 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 2019-08-31 02:56:59.436 7f05beffd700 0 bad crc in data 1465487574 != exp 1041109653 from 172.21.3.104:6809/8082 Thread 15 (Thread 0x7f05d3c5bf80 (LWP 9500)): #0 0x00007f05d34e8965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f05d379e326 in Cond::Wait (this=0x7ffc859040a0, mutex=...) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/common/Cond.h:48 #2 0x00007f05d3790a6f in librados::IoCtxImpl::operate_read (this=0x558e53189b70, oid=..., o=0x558e531e2880, pbl=0x7ffc85904270, flags=flags@entry=0) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/librados/IoCtxImpl.cc:754 #3 0x00007f05d3753c9d in librados::IoCtx::operate (this=this@entry=0x558e530b1768, oid="foo", o=o@entry=0x7ffc85904210, pbl=pbl@entry=0x7ffc85904270) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/librados/librados.cc:1521 #4 0x0000558e52d1122b in LibRadosIoPP_CmpExtPP_Test::TestBody (this=0x558e530b1750) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/test/librados/io.cc:1229 #5 0x0000558e52d60b5a in HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x558e52d843bf "the test body", method=<optimized out>, object=0x558e530b1750) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:2402 #6 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x558e530b1750, method=<optimized out>, location=location@entry=0x558e52d843bf "the test body") at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:2438 #7 0x0000558e52d5603a in testing::Test::Run (this=0x558e530b1750) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:2474 #8 0x0000558e52d5611c in testing::TestInfo::Run (this=0x558e53067400) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:2656 #9 0x0000558e52d56255 in testing::TestCase::Run (this=0x558e53058c50) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:2774 #10 0x0000558e52d56748 in testing::internal::UnitTestImpl::RunAllTests (this=0x558e53057fb0) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:4649 #11 0x0000558e52d6104a in HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x558e52d857e0 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x558e53057fb0) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:2402 #12 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x558e53057fb0, method=<optimized out>, location=location@entry=0x558e52d857e0 "auxiliary test code (environments or event listeners)") at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:2438 #13 0x0000558e52d568d0 in testing::UnitTest::Run (this=0x558e52fa6c00 <testing::UnitTest::GetInstance()::instance>) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/src/gtest.cc:4257 #14 0x0000558e52d09e09 in RUN_ALL_TESTS () at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googletest/include/gtest/gtest.h:2233 #15 main (argc=<optimized out>, argv=0x7ffc85904618) at /usr/src/debug/ceph-13.2.6-108-g2353be3/src/googletest/googlemock/src/gmock_main.cc:53
This is a duplicate of #40765
Updated by Brad Hubbard over 4 years ago
- Is duplicate of Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic added
Updated by Brad Hubbard over 4 years ago
- Is duplicate of deleted (Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic)
Updated by Brad Hubbard over 4 years ago
- Is duplicate of Bug #38416: crc cache should be invalidated when posting preallocated rx buffers added