Project

General

Profile

Bug #38238

rados/test.sh: api_aio_pp doesn't seem to start

Added by Sage Weil 5 months ago. Updated 1 day ago.

Status:
Verified
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
02/08/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
nautilus, mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

2019-02-08T01:31:44.931 INFO:tasks.workunit.client.0.smithi092.stdout:test api_aio_pp on pid 60289

and then no output from that test in the log. the test eventually times out with

2019-02-08T04:54:49.065 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 32, in nested
    yield vars
  File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/ceph.py", line 1933, in task
    yield
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/workunit.py", line 123, in task
    timeout=timeout,cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/workunit.py", line 409, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 194, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 435, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test rados/test.sh) on smithi092 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=ff23e1ed2868ffdfbd9f201f6ecbb8426b3aa0d7 TESTDIR="/home/ubuntu/cephtest" CEPH
_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'

/a/sage-2019-02-07_22:38:18-rados-master-distro-basic-smithi/3561540


Related issues

Related to RADOS - Bug #40765: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic New 07/12/2019
Duplicated by RADOS - Bug #38433: rados/test.sh timeout Duplicate 02/22/2019

History

#1 Updated by Greg Farnum 5 months ago

The referenced log doesn't seem to be this bug, as it got stuck on a crashed OSD...

#2 Updated by Greg Farnum 5 months ago

Ah, /a/sage-2019-02-07_22:38:18-rados-master-distro-basic-smithi/3561540 appears to be it

#3 Updated by Sage Weil 5 months ago

  • Description updated (diff)

another instance:

/a/sage-2019-02-21_21:52:17-rados-wip-sage3-testing-2019-02-21-1359-distro-basic-smithi/3622638

description: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml 2-recovery-overrides/{default.yaml} backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/crush-compat.yaml msgr-failures/osd-delay.yaml msgr/random.yaml objectstore/bluestore-stupid.yaml rados.yaml supported-random-distro$/{ubuntu_16.04.yaml} thrashers/default.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}

#4 Updated by Sage Weil 5 months ago

ceph-client.admin.19974.log.gz is aio_pp

it starts, gets a few tests in, then the log stops unexpectedly...

...
2019-02-21 23:36:36.285 7f6ea4a56700 10 client.4824.objecter _maybe_request_map subscribing (onetime) to next osd map
2019-02-21 23:36:36.285 7f6ea4a56700  1 -- 172.21.15.198:0/55569580 --> v1:172.21.15.181:6800/17224 -- ping magic: 0 v1 -- 0x7f6e80001ac0 con 0x7f6e880378b0
2019-02-21 23:36:37.161 7f6e97fff700  1 -- 172.21.15.198:0/55569580 <== mon.0 v1:172.21.15.181:6789/0 33 ==== osd_map(77..77 src has 1..77) v4 ==== 2376+0+0 (1343277661 0 0) 0x7f6ea0001a20 con 0x1a43470
2019-02-21 23:36:37.161 7f6e97fff700 10 client.4824.objecter ms_dispatch 0xf140d0 osd_map(77..77 src has 1..77) v4
2019-02-21 23:36:37.161 7f6e97fff700  3 client.4824.objecter handle_osd_map got epochs [77,77] > 76
2019-02-21 23:36:37.161 7f6e97fff700  3 client.4824.objecter handle_osd_map decoding incremental epoch 77
2019-02-21 23:36:37.161 7f6e97fff700 10 client.4824.objecter  checking op 8
2019-02-21 23:36:37.161 7f6e97fff700 20 client.4824.objecter _calc_target epoch 77 base foo5 @37 precalc_pgid 0 pgid 0.0 is_write
2019-02-21 23:36:37.161 7f6e97fff700 20 client.4824.objecter _calc_target target foo5 @37 -> pgid 37.8dd35e86
2019-02-21 23:36:37.161 7f6e97fff700 10 client.4824.objecter _maybe_request_map subscribing (onetime) to next osd map
2019-02-21 23:36:37.161 7f6e97fff700  1 -- 172.21.15.198:0/55569580 --> v1:172.21.15.181:6789/0 -- mon_subscribe({osdmap=78}) v3 -- 0x7f6e8802a600 con 0x1a43470
2019-02-21 23:36:37.161 7f6e97fff700 20 client.4824.objecter dump_active .. 0 homeless
2019-02-21 23:36:37.161 7f6e97fff700 20 client.4824.objecter 8  37.8dd35e86     osd.5   foo5    [writefull 0~4096]
2019-02-21 23:36:37.381 7f6ea5a58700  0 -- 172.21.15.198:0/55569580 >> v1:172.21.15.181:6789/0 conn(0x1a43470 legacy=0x1a42ca0 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until injecting socket failure
2019-02-21 23:36:37.381 7f6ea5a58700  1 -- 172.21.15.198:0/55569580 >> v1:172.21.15.181:6789/0 conn(0x1a43470 legacy=0x1a42ca0 :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 19
2019-02-21 23:36:37.381 7f6ea5a58700  1 -- 172.21.15.198:0/55569580 >> v1:172.21.15.181:6789/0 conn(0x1a43470 legacy=0x1a42ca0 :-1 s=STATE_

#5 Updated by Sage Weil 5 months ago

  • Duplicated by Bug #38433: rados/test.sh timeout added

#6 Updated by Sage Weil 5 months ago

/a/rdias-2019-02-26_22:35:27-rados-wip-rdias2-testing-distro-basic-smithi/3642422

description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/normal.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/crush-compat.yaml msgr-failures/fastclose.yaml msgr/async-v2only.yaml objectstore/bluestore-stupid.yaml rados.yaml supported-random-distro$/{ubuntu_16.04.yaml} thrashers/careful.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}

#7 Updated by Sage Weil 5 months ago

/a/sage-2019-02-28_12:30:17-rados-wip-sage-testing-2019-02-27-1720-distro-basic-smithi/3649931

description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/normal.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/crush-compat.yaml msgr-failures/osd-delay.yaml msgr/async.yaml objectstore/bluestore-bitmap.yaml rados.yaml supported-random-distro$/{ubuntu_latest.yaml} thrashers/default.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}

#8 Updated by Sage Weil 4 months ago

/a/sage-2019-03-10_01:08:05-rados-master-distro-basic-smithi/3703837

description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/upmap.yaml msgr-failures/fastclose.yaml msgr/simple.yaml objectstore/bluestore-bitmap.yaml rados.yaml supported-random-distro$/{ubuntu_latest.yaml} thrashers/careful.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}

#9 Updated by Sage Weil 4 months ago

/a/sage-2019-03-13_02:19:41-rados-wip-sage3-testing-2019-03-12-1657-distro-basic-smithi/3715202

#10 Updated by Neha Ojha 3 months ago

  • Backport set to nautilus

/a/yuriw-2019-04-05_00:28:05-rados-wip-yuri2-testing-2019-04-04-1953-nautilus-distro-basic-smithi/3811205/

#11 Updated by David Zafman 1 day ago

  • Backport changed from nautilus to nautilus, mimic

http://qa-proxy.ceph.com/teuthology/yuriw-2019-07-09_15:21:18-rados-wip-yuri-testing-2019-07-08-2007-mimic-distro-basic-smithi/4106022

2019-07-10T07:22:00.844 INFO:tasks.workunit.client.0.smithi180.stdout:                 api_tier: [  PASSED  ] 54 tests.
2019-07-10T07:22:00.846 INFO:tasks.workunit.client.0.smithi180.stderr:+ for t in '"${!pids[@]}"'
2019-07-10T07:22:00.846 INFO:tasks.workunit.client.0.smithi180.stderr:+ pid=12419
2019-07-10T07:22:00.846 INFO:tasks.workunit.client.0.smithi180.stderr:+ wait 12419
2019-07-10T07:22:00.846 INFO:tasks.workunit.client.0.smithi180.stderr:+ for t in '"${!pids[@]}"'
2019-07-10T07:22:00.847 INFO:tasks.workunit.client.0.smithi180.stderr:+ pid=12435
2019-07-10T07:22:00.847 INFO:tasks.workunit.client.0.smithi180.stderr:+ wait 12435
2019-07-10T07:22:00.847 INFO:tasks.workunit.client.0.smithi180.stderr:+ for t in '"${!pids[@]}"'
2019-07-10T07:22:00.848 INFO:tasks.workunit.client.0.smithi180.stderr:+ pid=12540
2019-07-10T07:22:00.848 INFO:tasks.workunit.client.0.smithi180.stderr:+ wait 12540
2019-07-10T07:22:00.849 INFO:tasks.workunit.client.0.smithi180.stderr:+ for t in '"${!pids[@]}"'
2019-07-10T07:22:00.849 INFO:tasks.workunit.client.0.smithi180.stderr:+ pid=12411
2019-07-10T07:22:00.849 INFO:tasks.workunit.client.0.smithi180.stderr:+ wait 12411
2019-07-10T13:09:25.293 INFO:tasks.workunit.client.0.smithi180.stderr:++ cleanup
2019-07-10T13:09:25.323 INFO:tasks.workunit.client.0.smithi180.stderr:++ pkill -P 12398
2019-07-10T13:09:25.359 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh: line 10: 12411 Terminated              bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$ff.log | sed \"s/^/$r: /\"" 
2019-07-10T13:09:25.359 INFO:tasks.workunit.client.0.smithi180.stderr:++ true
2019-07-10T13:09:25.392 DEBUG:teuthology.orchestra.run:got remote process result: 124
2019-07-10T13:09:25.392 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.0...
2019-07-10T13:09:25.392 INFO:teuthology.orchestra.run.smithi180:Running:
2019-07-10T13:09:25.393 INFO:teuthology.orchestra.run.smithi180:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2019-07-10T13:09:26.597 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2019-07-08-2007-mimic/qa/tasks/workunit.py", line 123, in task
    timeout=timeout,cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2019-07-08-2007-mimic/qa/tasks/workunit.py", line 409, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 205, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 437, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test rados/test.sh) on smithi180 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=f513862830ff116ae26fc1838f038ad2ed11d3a9 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'

#12 Updated by Sage Weil 1 day ago

  • Related to Bug #40765: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic added

Also available in: Atom PDF