Project

General

Profile

Actions

Bug #50595

closed

upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure

Added by Yuri Weinstein almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seems to be ubuntu 18.04 specific

Run: https://pulpito.ceph.com/yuriw-2021-04-29_16:10:26-upgrade:nautilus-x-pacific-distro-basic-smithi/
Jobs: '6082840', '6082837', '6082844', '6082813', '6082829', '6082828', '6082817'
Logs: yuriw-2021-04-29_16:10:26-upgrade:nautilus-x-pacific-distro-basic-smithi/6082813/teuthology.log

2021-04-30T14:58:23.417 INFO:tasks.mon_thrash.mon_thrasher:waiting for 20.0 secs before reviving monitors
2021-04-30T14:58:32.720 INFO:tasks.workunit.client.1.smithi199.stderr:++ cleanup
2021-04-30T14:58:32.721 INFO:tasks.workunit.client.1.smithi199.stderr:++ pkill -P 1682380
2021-04-30T14:58:32.811 DEBUG:teuthology.orchestra.run:got remote process result: 124
2021-04-30T14:58:32.812 INFO:tasks.workunit.client.1.smithi199.stderr:/home/ubuntu/cephtest/clone.client.1/qa/workunits/rados/test.sh: line 10: 1682780 Terminated              bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$ff.log | sed \"s/^/$r: /\"" 
2021-04-30T14:58:32.812 INFO:tasks.workunit.client.1.smithi199.stderr:++ true
2021-04-30T14:58:32.812 INFO:tasks.workunit:Stopping ['rados/test.sh'] on client.1...
2021-04-30T14:58:32.813 DEBUG:teuthology.orchestra.run.smithi199:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.1 /home/ubuntu/cephtest/clone.client.1
2021-04-30T14:58:33.051 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/run_tasks.py", line 91, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_717ce59b76c659aaef8c5aec1355c0ac5cef7234/qa/tasks/workunit.py", line 134, in task
    coverage_and_limits=not config.get('no_coverage_and_limits', None))
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_717ce59b76c659aaef8c5aec1355c0ac5cef7234/qa/tasks/workunit.py", line 425, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)

  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/orchestra/remote.py", line 215, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_bea5b7399b9dcb9d756a32fdfc32f498319d425f/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed (workunit test rados/test.sh) on smithi199 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=717ce59b76c659aaef8c5aec1355c0ac5cef7234 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.1/qa/workunits/rados/test.sh'


Related issues 2 (0 open2 closed)

Related to RADOS - Bug #50042: rados/test.sh: api_watch_notify failuresResolvedNitzan Mordechai

Actions
Copied to RADOS - Backport #50666: pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failureResolvedActions
Actions #1

Updated by Neha Ojha almost 3 years ago

  • Subject changed from "test.sh: line 10: 1682780 Terminated" in upgrade:nautilus-x-pacific to upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
2021-04-30T15:04:31.103 INFO:tasks.workunit.client.1.smithi159.stderr:++ cleanup
2021-04-30T15:04:31.103 INFO:tasks.workunit.client.1.smithi159.stderr:++ pkill -P 1651927
2021-04-30T15:04:31.122 INFO:tasks.workunit.client.1.smithi159.stderr:/home/ubuntu/cephtest/clone.client.1/qa/workunits/rados/test.sh: line 10: 1652297 Terminated              bash -o pipefail -exc "ceph_test_rados_$f $color 2>&1 | tee ceph_test_rados_$ff.log | sed \"s/^/$r: /\"" 
2021-04-30T15:04:31.122 INFO:tasks.workunit.client.1.smithi159.stderr:++ true
2021-04-30T15:04:31.123 DEBUG:teuthology.orchestra.run:got remote process result: 124
2021-04-30T12:04:31.251 INFO:tasks.workunit.client.1.smithi159.stderr:+ pid=1652297
2021-04-30T12:04:31.251 INFO:tasks.workunit.client.1.smithi159.stderr:+ echo 'test api_service on pid 1652297'
2021-04-30T12:04:38.663 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service: [ RUN      ] LibRadosService.StatusFormat
...
2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:   services:
2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mon:     3 daemons, quorum a,b,c (age 6s)
2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mgr:     x(active, since 91m)
2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mds:     1/1 daemons up
2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     osd:     12 osds: 12 up (since 90m), 12 in (since 96m)
2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     laundry: 2 daemons active (1 hosts)
2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     rgw:     1 daemon active (1 hosts, 1 zones)
2021-04-30T12:04:38.664 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:
...
2021-04-30T12:04:38.666 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:   services:
2021-04-30T12:04:38.666 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mon:     3 daemons, quorum a,b,c (age 8s)
2021-04-30T12:04:38.666 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mgr:     x(active, since 91m)
2021-04-30T12:04:38.666 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mds:     1/1 daemons up
2021-04-30T12:04:38.667 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     osd:     12 osds: 12 up (since 90m), 12 in (since 96m)
2021-04-30T12:04:38.667 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     laundry: 2 daemons active (1 hosts)
2021-04-30T12:04:38.667 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     rgw:     1 daemon active (1 hosts, 1 zones)
...
2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:   services:
2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mon:     3 daemons, quorum a,b,c (age 10s)
2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mgr:     x(active, since 91m)
2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mds:     1/1 daemons up
2021-04-30T12:04:51.427 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     osd:     12 osds: 12 up (since 90m), 12 in (since 96m)
2021-04-30T12:04:51.428 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     laundry: 2 daemons active (1 hosts)
2021-04-30T12:04:51.428 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     rgw:     1 daemon active (1 hosts, 1 zones)
2021-04-30T12:04:51.428 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:
...
2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:   services:
2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mon:     3 daemons, quorum a,b,c (age 12s)
2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mgr:     x(active, since 91m)
2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mds:     1/1 daemons up
2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     osd:     12 osds: 12 up (since 90m), 12 in (since 96m)
2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     laundry: 2 daemons active (1 hosts)
2021-04-30T12:04:51.430 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     rgw:     1 daemon active (1 hosts, 1 zones)
...
2021-04-30T12:04:51.432 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:   services:
2021-04-30T12:04:51.432 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mon:     3 daemons, quorum b,c (age 0.335996s), out of quorum: a
2021-04-30T12:04:51.432 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mgr:     x(active, since 91m)
2021-04-30T12:04:51.432 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     mds:     1/1 daemons up
2021-04-30T12:04:51.433 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     osd:     12 osds: 12 up (since 90m), 12 in (since 96m)
2021-04-30T12:04:51.433 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     laundry: 2 daemons active (1 hosts)
2021-04-30T12:04:51.433 INFO:tasks.workunit.client.1.smithi159.stdout:              api_service:     rgw:     1 daemon active (1 hosts, 1 zones)

So it tries 5 times and fails to find "16 portals active (1 hosts, 3 zones)"

Same as https://tracker.ceph.com/issues/50042#note-3

Actions #2

Updated by Neha Ojha almost 3 years ago

  • Project changed from Ceph to RADOS
Actions #3

Updated by Neha Ojha almost 3 years ago

  • Status changed from New to Triaged

This seems related to ab0d8f2ae9f551e15a4c7bacbf69161e91263785.
Reverting makes the issue go away http://pulpito.front.sepia.ceph.com/yuriw-2021-04-30_22:41:50-upgrade:nautilus-x-wip-50595-pacific-distro-basic-smithi/

Actions #4

Updated by Neha Ojha almost 3 years ago

  • Related to Bug #50042: rados/test.sh: api_watch_notify failures added
Actions #5

Updated by Sage Weil almost 3 years ago

I think the upgrade test just needs to skip that test. It's just looking for a specific string that changed in pacific.

Actions #6

Updated by Neha Ojha almost 3 years ago

The upgrade test do not fail all the time.

upgrade:nautilus-x/parallel/{0-cluster/{openstack start} 1-ceph-install/nautilus 1.1-pg-log-overrides/normal_pg_log 2-workload/{blogbench ec-rados-default rados_api rados_loadgenbig rgw_ragweed_prepare test_rbd_api test_rbd_python} 3-upgrade-sequence/upgrade-all 4-octopus 5-final-workload/{blogbench rados-snaps-few-objects rados_loadgenmix rados_mon_thrash rbd_cls rbd_import_export rgw rgw_ragweed_check} mon_election/classic objectstore/bluestore-bitmap ubuntu_18.04}

yuriw-2021-04-30_21:02:01-upgrade:nautilus-x-pacific-distro-basic-smithi/6086998 - passed

2021-04-30T22:58:06.375 INFO:tasks.workunit.client.1.smithi104.stdout:              api_service:   services:
2021-04-30T22:58:06.375 INFO:tasks.workunit.client.1.smithi104.stdout:              api_service:     mon:     3 daemons, quorum a,b,c (age 4s)
2021-04-30T22:58:06.375 INFO:tasks.workunit.client.1.smithi104.stdout:              api_service:     mgr:     x(active, since 92m)
2021-04-30T22:58:06.375 INFO:tasks.workunit.client.1.smithi104.stdout:              api_service:     mds:     1/1 daemons up
2021-04-30T22:58:06.376 INFO:tasks.workunit.client.1.smithi104.stdout:              api_service:     osd:     12 osds: 12 up (since 91m), 12 in (since 97m)
2021-04-30T22:58:06.376 INFO:tasks.workunit.client.1.smithi104.stdout:              api_service:     foo:     16 portals active (1 hosts, 3 zones)
2021-04-30T22:58:06.376 INFO:tasks.workunit.client.1.smithi104.stdout:              api_service:     laundry: 1 daemon active (1 hosts)
2021-04-30T22:58:06.376 INFO:tasks.workunit.client.1.smithi104.stdout:              api_service:     rgw:     1 daemon active (1 hosts, 1 zones)
Actions #7

Updated by Sage Weil almost 3 years ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 41147
/build/ceph-16.2.1-257-g717ce59b/src/test/librados/service.cc:166: Failure
Expected: (0) != (retry), actual: 0 vs 0
Actions #8

Updated by Sage Weil almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to pacific
Actions #9

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50666: pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure added
Actions #10

Updated by Sage Weil almost 3 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF