Project

General

Profile

Bug #10532

Failed "joining thrashosds" - suspected low memory config on vps

Added by Yuri Weinstein about 9 years ago. Updated over 8 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-11_17:05:01-upgrade:giant-x-next-distro-basic-vps/698063/

2015-01-12T21:46:45.720 INFO:teuthology.orchestra.run.vpm106.stdout:successfully deleted pool unique_pool_2
2015-01-12T21:46:45.722 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-01-12T21:46:45.722 DEBUG:teuthology.run_tasks:Unwinding manager rados
2015-01-12T21:46:45.722 INFO:tasks.rados:joining rados
2015-01-12T21:46:45.722 DEBUG:teuthology.run_tasks:Unwinding manager rados
2015-01-12T21:46:45.723 INFO:tasks.rados:joining rados
2015-01-12T21:46:45.723 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-01-12T21:46:45.723 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2015-01-12T21:46:45.723 INFO:tasks.thrashosds:joining thrashosds
2015-01-12T21:46:45.723 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 119, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/thrashosds.py", line 174, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/ceph_manager.py", line 314, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.10 restart

History

#1 Updated by Yuri Weinstein about 9 years ago

See the same in http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-13_17:18:01-upgrade:firefly-x-next-distro-basic-vps/701660/

2015-01-14T08:48:14.050 INFO:teuthology.orchestra.run.vpm054:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados rmpool unique_pool_2 unique_pool_2 --yes-i-really-really-mean-it'
2015-01-14T08:48:14.225 INFO:teuthology.orchestra.run.vpm054.stdout:successfully deleted pool unique_pool_2
2015-01-14T08:48:14.227 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-01-14T08:48:14.227 DEBUG:teuthology.run_tasks:Unwinding manager rados
2015-01-14T08:48:14.227 INFO:tasks.rados:joining rados
2015-01-14T08:48:14.228 DEBUG:teuthology.run_tasks:Unwinding manager rados
2015-01-14T08:48:14.228 INFO:tasks.rados:joining rados
2015-01-14T08:48:14.228 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-01-14T08:48:14.228 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2015-01-14T08:48:14.228 INFO:tasks.thrashosds:joining thrashosds
2015-01-14T08:48:14.229 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 119, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/thrashosds.py", line 174, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/ceph_manager.py", line 314, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.13 restart

Sam: slow vps on this one.

#5 Updated by Samuel Just about 9 years ago

  • Status changed from New to Rejected

I'm going to declare this batch to be vps related.

#6 Updated by Yuri Weinstein about 9 years ago

Same in
http://pulpito.ceph.com/teuthology-2015-02-03_17:05:01-upgrade:giant-x-next-distro-basic-vps/
['739080', '739085']

I am reopening this and assigning to devops as those seems persistent and the only difference is that we need to address those by changing/h/w/memory or else.

#7 Updated by Yuri Weinstein about 9 years ago

  • Project changed from Ceph to devops
  • Status changed from Rejected to New

#8 Updated by Yuri Weinstein about 9 years ago

  • Assignee set to Sage Weil

Sage, assigned to you for prioritization.

#11 Updated by Yuri Weinstein about 9 years ago

  • Subject changed from Failed "joining thrashosds" in upgrade:giant-x-next-distro-basic-vps run to Failed "joining thrashosds" - suspected low memory config on vps

#12 Updated by Kefu Chai about 9 years ago

could be related to #10630 ?

#14 Updated by Sage Weil over 8 years ago

  • Status changed from New to Won't Fix
  • Regression set to No

Also available in: Atom PDF