Project

General

Profile

Actions

Bug #7596

closed

task/ceph_manager.py: Exceptions are being swallowed

Added by Zack Cerza about 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I reran this job with the attach yaml:

2014-03-03 16:12:51,416.416 INFO:teuthology.orchestra.run.err:[10.214.131.27]: 0
2014-03-03 16:12:51,417.417
INFO:teuthology.orchestra.run.err:[10.214.131.27]: admin_socket:
invalid command

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 390, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/ubuntu/zack/teuthology/teuthology/task/ceph_manager.py",
line 362, in do_thrash
    self.revive_osd()
  File "/home/ubuntu/zack/teuthology/teuthology/task/ceph_manager.py",
line 96, in revive_osd
    self.ceph_manager.revive_osd(osd, self.revive_timeout)
  File "/home/ubuntu/zack/teuthology/teuthology/task/ceph_manager.py",
line 1216, in revive_osd
    timeout=timeout)
  File "/home/ubuntu/zack/teuthology/teuthology/task/ceph_manager.py",
line 618, in wait_run_admin_socket
    raise Exception('timed out waiting for admin_socket to appear
after osd.{o} restart'.format(o=osdnum))
Exception: timed out waiting for admin_socket to appear after osd.5 restart
<Greenlet at 0x2b26b78: <bound method Thrasher.do_thrash of
<teuthology.task.ceph_manager.Thrasher instance at 0x2b2dd88>>> failed
with Exception^C

Here I waited a few minutes and hit Ctrl-C
2014-03-03 16:16:00,337.337 INFO:teuthology.task.rados:joining rados
2014-03-03 16:16:00,337.337 ERROR:teuthology.run_tasks:Manager failed: rados
Traceback (most recent call last):
  File "/home/ubuntu/zack/teuthology/teuthology/run_tasks.py", line
84, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/ubuntu/zack/teuthology/teuthology/task/rados.py", line
175, in task
    running.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception

CommandFailedError: Command failed on 10.214.131.31 with status 1:
'CEPH_CLIENT_ID=0 adjust-ulimits ceph-coverage
/home/ubuntu/cephtest/archive/coverage ceph_test_rados --op read 45
--op write 45 --op delete 10 --op snap_create 0 --op snap_remove 0
--op rollback 0 --op setattr 0 --op rmattr 0 --op watch 0 --op append
0 --max-ops 4000 --objects 500 --max-in-flight 16 --size 4000000
--min-stride-size 400000 --max-stride-size 800000 --max-seconds 0
--pool unique_pool_0'

The first traceback is not visible in the teuthology.log generated
by the job. So something very bad is going on with the way gevent is
being used.


Files

hang.yaml (2.23 KB) hang.yaml Zack Cerza, 03/04/2014 08:27 AM
Actions #1

Updated by Ian Colle about 10 years ago

  • Priority changed from High to Normal
Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF