Project

General

Profile

Actions

Bug #20449

closed

tests: rados/thrash...rgw_snaps.yaml transient failure: Timeout exception after greenlet.switch with gevent worker

Added by Nathan Cutler almost 7 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This happened in jewel 10.2.8 integration testing.

Test: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml clusters/{fixed-2.yaml openstack.yaml} fs/btrfs.yaml hobj-sort.yaml msgr-failures/fastclose.yaml msgr/async.yaml rados.yaml thrashers/pggrow.yaml workloads/rgw_snaps.yaml}

Test URL: http://pulpito.front.sepia.ceph.com/smithfarm-2017-06-27_19:13:42-rados-wip-jewel-backports-distro-basic-smithi/1333051/

Failure message: Command failed on smithi165 with status 1: '/home/ubuntu/cephtest/s3-tests/virtualenv/bin/s3tests-test-readwrite'

Analysis: gevent times out, causing transient failures (~40% of runs fail)

Traceback:

2017-06-30T09:27:46.703 INFO:teuthology.orchestra.run.smithi099.stderr:Traceback (most recent call last):
2017-06-30T09:27:46.703 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/bin/s3tests-test-readwrite", line 11, in <module>
2017-06-30T09:27:46.703 INFO:teuthology.orchestra.run.smithi099.stderr:    load_entry_point('s3tests', 'console_scripts', 's3tests-test-readwrite')()
2017-06-30T09:27:46.704 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/s3tests/readwrite.py", line 256, in main
2017-06-30T09:27:46.704 INFO:teuthology.orchestra.run.smithi099.stderr:    trace=temp_dict['error']['traceback'])
2017-06-30T09:27:46.704 INFO:teuthology.orchestra.run.smithi099.stderr:Exception: exception:
2017-06-30T09:27:46.704 INFO:teuthology.orchestra.run.smithi099.stderr: timed out
2017-06-30T09:27:46.704 INFO:teuthology.orchestra.run.smithi099.stderr: Traceback (most recent call last):
2017-06-30T09:27:46.704 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/s3tests/readwrite.py", line 34, in reader
2017-06-30T09:27:46.704 INFO:teuthology.orchestra.run.smithi099.stderr:    key.get_contents_to_file(fp._file)
2017-06-30T09:27:46.704 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/boto/s3/key.py", line 1662, in get_contents_to_file
2017-06-30T09:27:46.705 INFO:teuthology.orchestra.run.smithi099.stderr:    response_headers=response_headers)
2017-06-30T09:27:46.705 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/boto/s3/key.py", line 1494, in get_file
2017-06-30T09:27:46.705 INFO:teuthology.orchestra.run.smithi099.stderr:    query_args=None)
2017-06-30T09:27:46.705 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/boto/s3/key.py", line 1547, in _get_file_internal
2017-06-30T09:27:46.705 INFO:teuthology.orchestra.run.smithi099.stderr:    for bytes in self:
2017-06-30T09:27:46.705 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/boto/s3/key.py", line 398, in next
2017-06-30T09:27:46.705 INFO:teuthology.orchestra.run.smithi099.stderr:    data = self.resp.read(self.BufferSize)
2017-06-30T09:27:46.705 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/boto/connection.py", line 413, in read
2017-06-30T09:27:46.706 INFO:teuthology.orchestra.run.smithi099.stderr:    return http_client.HTTPResponse.read(self, amt)
2017-06-30T09:27:46.706 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/usr/lib/python2.7/httplib.py", line 612, in read
2017-06-30T09:27:46.706 INFO:teuthology.orchestra.run.smithi099.stderr:    s = self.fp.read(amt)
2017-06-30T09:27:46.706 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/usr/lib/python2.7/socket.py", line 384, in read
2017-06-30T09:27:46.706 INFO:teuthology.orchestra.run.smithi099.stderr:    data = self._sock.recv(left)
2017-06-30T09:27:46.706 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/gevent/_socket2.py", line 283, in recv
2017-06-30T09:27:46.706 INFO:teuthology.orchestra.run.smithi099.stderr:    self._wait(self._read_event)
2017-06-30T09:27:46.706 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/gevent/_socket2.py", line 182, in _wait
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:    self.hub.wait(watcher)
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 651, in wait
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:    result = waiter.get()
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 898, in get
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:    return self.hub.switch()
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:  File "/home/ubuntu/cephtest/s3-tests/virtualenv/local/lib/python2.7/site-packages/gevent/hub.py", line 630, in switch
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:    return RawGreenlet.switch(self)
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:timeout: timed out
2017-06-30T09:27:46.707 INFO:teuthology.orchestra.run.smithi099.stderr:

N.B.: almost the exact same traceback as in https://github.com/benoitc/gunicorn/issues/880

Actions #1

Updated by Nathan Cutler almost 7 years ago

  • Release set to jewel
  • Affected Versions v10.2.8 added
Actions #2

Updated by Nathan Cutler almost 7 years ago

The failure is transient.

Actions #3

Updated by Nathan Cutler almost 7 years ago

teuthology-suite -k distro --priority 101 --suite rados --email ncutler@suse.com --ceph wip-jewel-backports --machine-type smithi --filter="thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml clusters/{fixed-2.yaml openstack.yaml} fs/btrfs.yaml hobj-sort.yaml msgr-failures/fastclose.yaml msgr/async.yaml rados.yaml thrashers/pggrow.yaml workloads/rgw_snaps.yaml}" --num 5

3 pass, 2 fail (5 total) http://pulpito.front.sepia.ceph.com:80/smithfarm-2017-06-30_09:12:09-rados-wip-jewel-backports-distro-basic-smithi/

Actions #4

Updated by Nathan Cutler almost 7 years ago

  • Description updated (diff)
Actions #5

Updated by Nathan Cutler almost 7 years ago

  • Subject changed from tests: rados/thrash...rgw_snaps.yaml fails because rgw task bails out after creating just two rgw pools to tests: rados/thrash...rgw_snaps.yaml transient failure (similar to "fails because of greenlet timeou
  • Description updated (diff)
Actions #6

Updated by Nathan Cutler almost 7 years ago

  • Subject changed from tests: rados/thrash...rgw_snaps.yaml transient failure (similar to "fails because of greenlet timeou to tests: rados/thrash...rgw_snaps.yaml transient failure similar to https://github.com/benoitc/gunicorn/issues/880
Actions #7

Updated by Nathan Cutler almost 7 years ago

  • Subject changed from tests: rados/thrash...rgw_snaps.yaml transient failure similar to https://github.com/benoitc/gunicorn/issues/880 to tests: rados/thrash...rgw_snaps.yaml transient failure: Timeout exception after greenlet.switch with gevent worker
Actions #8

Updated by Greg Farnum almost 7 years ago

  • Project changed from RADOS to Ceph

So this is a bug in the the qa-suite python code, right? That's not a RADOS issue... :)

Actions #9

Updated by Sage Weil almost 3 years ago

  • Project changed from Ceph to rgw
Actions #10

Updated by Casey Bodley over 2 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF