Bug #50595
closedupgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
0%
Description
Seems to be ubuntu 18.04 specific
Run: https://pulpito.ceph.com/yuriw-2021-04-29_16:10:26-upgrade:nautilus-x-pacific-distro-basic-smithi/
Jobs: '6082840', '6082837', '6082844', '6082813', '6082829', '6082828', '6082817'
Logs: yuriw-2021-04-29_16:10:26-upgrade:nautilus-x-pacific-distro-basic-smithi/6082813/teuthology.log
2021-04-30T14:58:23.417 INFO:tasks.mon_thrash.mon_thrasher:waiting for 20.0 secs before reviving monitors 2021-04-30T14:58:32.720 INFO:tasks.workunit.client.1.smithi199.stderr:++ cleanup 2021-04-30T14:58:32.721 INFO:tasks.workunit.client.1.smithi199.stderr:++ pkill -P 1682380 2021-04-30T14:58:32.811 DEBUG:teuthology.orchestra.run:got remote process result: 124 2021-04-30T14:58:32.812 INFO:tasks.workunit.client.1.smithi199.stderr:/home/ubuntu/cephtest/clone.client.1/qa/workunits/rados/test.sh: line 10: 1682780 Terminated bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$ff.log | sed \"s/^/$r: /\"" 2021-04-30T14:58:32.812 INFO:tasks.workunit.client.1.smithi199.stderr:++ true 2021-04-30T14:58:32.812 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.1... 2021-04-30T14:58:32.813 DEBUG:teuthology.orchestra.run.smithi199:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.1 /home/ubuntu/cephtest/clone.client.1 2021-04-30T14:58:33.051 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/run_tasks.py", line 91, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/run_tasks.py", line 70, in run_one_task return task(**kwargs) File "/home/teuthworker/src/github.com_ceph_ceph_717ce59b76c659aaef8c5aec1355c0ac5cef7234/qa/tasks/workunit.py", line 134, in task coverage_and_limits=not config.get('no_coverage_and_limits', None)) File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/parallel.py", line 84, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/parallel.py", line 98, in __next__ resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/parallel.py", line 30, in resurrect_traceback raise exc.exc_info[1] File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/parallel.py", line 23, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/github.com_ceph_ceph_717ce59b76c659aaef8c5aec1355c0ac5cef7234/qa/tasks/workunit.py", line 425, in _run_tests label="workunit test {workunit}".format(workunit=workunit) File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/orchestra/remote.py", line 215, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/orchestra/run.py", line 455, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/orchestra/run.py", line 183, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed (workunit test rados/test.sh) on smithi199 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=717ce59b76c659aaef8c5aec1355c0ac5cef7234 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.1/qa/workunits/rados/test.sh'
Updated by Neha Ojha almost 3 years ago
- Subject changed from "test.sh: line 10: 1682780 Terminated" in upgrade:nautilus-x-pacific to upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
2021-04-30T15:04:31.103 INFO:tasks.workunit.client.1.smithi159.stderr:++ cleanup 2021-04-30T15:04:31.103 INFO:tasks.workunit.client.1.smithi159.stderr:++ pkill -P 1651927 2021-04-30T15:04:31.122 INFO:tasks.workunit.client.1.smithi159.stderr:/home/ubuntu/cephtest/clone.client.1/qa/workunits/rados/test.sh: line 10: 1652297 Terminated bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$ff.log | sed \"s/^/$r: /\"" 2021-04-30T15:04:31.122 INFO:tasks.workunit.client.1.smithi159.stderr:++ true 2021-04-30T15:04:31.123 DEBUG:teuthology.orchestra.run:got remote process result: 124
2021-04-30T12:04:31.251 INFO:tasks.workunit.client.1.smithi159.stderr:+ pid=1652297 2021-04-30T12:04:31.251 INFO:tasks.workunit.client.1.smithi159.stderr:+ echo 'test api_service on pid 1652297'
2021-04-30T12:04:38.663 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: [ RUN ] LibRadosService.StatusFormat ... 2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: services: 2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mon: 3 daemons, quorum a,b,c (age 6s) 2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mgr: x(active, since 91m) 2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mds: 1/1 daemons up 2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: osd: 12 osds: 12 up (since 90m), 12 in (since 96m) 2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: laundry: 2 daemons active (1 hosts) 2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: rgw: 1 daemon active (1 hosts, 1 zones) 2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: ... 2021-04-30T12:04:38.666 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: services: 2021-04-30T12:04:38.666 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mon: 3 daemons, quorum a,b,c (age 8s) 2021-04-30T12:04:38.666 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mgr: x(active, since 91m) 2021-04-30T12:04:38.666 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mds: 1/1 daemons up 2021-04-30T12:04:38.667 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: osd: 12 osds: 12 up (since 90m), 12 in (since 96m) 2021-04-30T12:04:38.667 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: laundry: 2 daemons active (1 hosts) 2021-04-30T12:04:38.667 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: rgw: 1 daemon active (1 hosts, 1 zones) ... 2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: services: 2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mon: 3 daemons, quorum a,b,c (age 10s) 2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mgr: x(active, since 91m) 2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mds: 1/1 daemons up 2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: osd: 12 osds: 12 up (since 90m), 12 in (since 96m) 2021-04-30T12:04:51.428 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: laundry: 2 daemons active (1 hosts) 2021-04-30T12:04:51.428 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: rgw: 1 daemon active (1 hosts, 1 zones) 2021-04-30T12:04:51.428 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: ... 2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: services: 2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mon: 3 daemons, quorum a,b,c (age 12s) 2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mgr: x(active, since 91m) 2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mds: 1/1 daemons up 2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: osd: 12 osds: 12 up (since 90m), 12 in (since 96m) 2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: laundry: 2 daemons active (1 hosts) 2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: rgw: 1 daemon active (1 hosts, 1 zones) ... 2021-04-30T12:04:51.432 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: services: 2021-04-30T12:04:51.432 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mon: 3 daemons, quorum b,c (age 0.335996s), out of quorum: a 2021-04-30T12:04:51.432 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mgr: x(active, since 91m) 2021-04-30T12:04:51.432 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: mds: 1/1 daemons up 2021-04-30T12:04:51.433 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: osd: 12 osds: 12 up (since 90m), 12 in (since 96m) 2021-04-30T12:04:51.433 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: laundry: 2 daemons active (1 hosts) 2021-04-30T12:04:51.433 INFO:tasks.workunit.client.1.smithi159.stdout: api_service: rgw: 1 daemon active (1 hosts, 1 zones)
So it tries 5 times and fails to find "16 portals active (1 hosts, 3 zones)"
Updated by Neha Ojha almost 3 years ago
- Status changed from New to Triaged
This seems related to ab0d8f2ae9f551e15a4c7bacbf69161e91263785.
Reverting makes the issue go away http://pulpito.front.sepia.ceph.com/yuriw-2021-04-30_22:41:50-upgrade:nautilus-x-wip-50595-pacific-distro-basic-smithi/
Updated by Neha Ojha almost 3 years ago
- Related to Bug #50042: rados/test.sh: api_watch_notify failures added
Updated by Sage Weil almost 3 years ago
I think the upgrade test just needs to skip that test. It's just looking for a specific string that changed in pacific.
Updated by Neha Ojha almost 3 years ago
The upgrade test do not fail all the time.
upgrade:nautilus-x/parallel/{0-cluster/{openstack start} 1-ceph-install/nautilus 1.1-pg-log-overrides/normal_pg_log 2-workload/{blogbench ec-rados-default rados_api rados_loadgenbig rgw_ragweed_prepare test_rbd_api test_rbd_python} 3-upgrade-sequence/upgrade-all 4-octopus 5-final-workload/{blogbench rados-snaps-few-objects rados_loadgenmix rados_mon_thrash rbd_cls rbd_import_export rgw rgw_ragweed_check} mon_election/classic objectstore/bluestore-bitmap ubuntu_18.04}
yuriw-2021-04-30_21:02:01-upgrade:nautilus-x-pacific-distro-basic-smithi/6086998 - passed
2021-04-30T22:58:06.375 INFO:tasks.workunit.client.1.smithi104.stdout: api_service: services: 2021-04-30T22:58:06.375 INFO:tasks.workunit.client.1.smithi104.stdout: api_service: mon: 3 daemons, quorum a,b,c (age 4s) 2021-04-30T22:58:06.375 INFO:tasks.workunit.client.1.smithi104.stdout: api_service: mgr: x(active, since 92m) 2021-04-30T22:58:06.375 INFO:tasks.workunit.client.1.smithi104.stdout: api_service: mds: 1/1 daemons up 2021-04-30T22:58:06.376 INFO:tasks.workunit.client.1.smithi104.stdout: api_service: osd: 12 osds: 12 up (since 91m), 12 in (since 97m) 2021-04-30T22:58:06.376 INFO:tasks.workunit.client.1.smithi104.stdout: api_service: foo: 16 portals active (1 hosts, 3 zones) 2021-04-30T22:58:06.376 INFO:tasks.workunit.client.1.smithi104.stdout: api_service: laundry: 1 daemon active (1 hosts) 2021-04-30T22:58:06.376 INFO:tasks.workunit.client.1.smithi104.stdout: api_service: rgw: 1 daemon active (1 hosts, 1 zones)
Updated by Sage Weil almost 3 years ago
- Status changed from Triaged to Fix Under Review
- Pull request ID set to 41147
/build/ceph-16.2.1-257-g717ce59b/src/test/librados/service.cc:166: Failure Expected: (0) != (retry), actual: 0 vs 0
Updated by Sage Weil almost 3 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to pacific
Updated by Backport Bot almost 3 years ago
- Copied to Backport #50666: pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure added
Updated by Sage Weil almost 3 years ago
- Status changed from Pending Backport to Resolved