Bug #10532: Failed "joining thrashosds" - suspected low memory config on vps - devops - Ceph

Actions

Copy link

Bug #10532

closed

Failed "joining thrashosds" - suspected low memory config on vps

Added by Yuri Weinstein over 9 years ago. Updated over 8 years ago.

Status:

Won't Fix

Priority:

Urgent

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-11_17:05:01-upgrade:giant-x-next-distro-basic-vps/698063/

2015-01-12T21:46:45.720 INFO:teuthology.orchestra.run.vpm106.stdout:successfully deleted pool unique_pool_2
2015-01-12T21:46:45.722 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-01-12T21:46:45.722 DEBUG:teuthology.run_tasks:Unwinding manager rados
2015-01-12T21:46:45.722 INFO:tasks.rados:joining rados
2015-01-12T21:46:45.722 DEBUG:teuthology.run_tasks:Unwinding manager rados
2015-01-12T21:46:45.723 INFO:tasks.rados:joining rados
2015-01-12T21:46:45.723 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-01-12T21:46:45.723 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2015-01-12T21:46:45.723 INFO:tasks.thrashosds:joining thrashosds
2015-01-12T21:46:45.723 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 119, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/thrashosds.py", line 174, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/ceph_manager.py", line 314, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.10 restart

Actions

Copy link

Updated by Yuri Weinstein over 9 years ago

See the same in http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-13_17:18:01-upgrade:firefly-x-next-distro-basic-vps/701660/

2015-01-14T08:48:14.050 INFO:teuthology.orchestra.run.vpm054:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados rmpool unique_pool_2 unique_pool_2 --yes-i-really-really-mean-it'
2015-01-14T08:48:14.225 INFO:teuthology.orchestra.run.vpm054.stdout:successfully deleted pool unique_pool_2
2015-01-14T08:48:14.227 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-01-14T08:48:14.227 DEBUG:teuthology.run_tasks:Unwinding manager rados
2015-01-14T08:48:14.227 INFO:tasks.rados:joining rados
2015-01-14T08:48:14.228 DEBUG:teuthology.run_tasks:Unwinding manager rados
2015-01-14T08:48:14.228 INFO:tasks.rados:joining rados
2015-01-14T08:48:14.228 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2015-01-14T08:48:14.228 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2015-01-14T08:48:14.228 INFO:tasks.thrashosds:joining thrashosds
2015-01-14T08:48:14.229 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 119, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/thrashosds.py", line 174, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/ceph_manager.py", line 314, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.13 restart

Sam: slow vps on this one.

Actions

Copy link

Updated by Yuri Weinstein over 9 years ago

Same

Run http://pulpito.front.sepia.ceph.com/teuthology-2015-01-19_18:18:01-upgrade:firefly-x-giant-distro-basic-vps/

Job 713595

Sam: Slow vps

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Priority changed from Normal to Urgent

Same

Run: http://pulpito.ceph.com/teuthology-2015-01-22_14:00:18-upgrade:firefly-x-giant-distro-basic-vps/

Job: 718184

Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-22_14:00:18-upgrade:firefly-x-giant-distro-basic-vps/718184/

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

And again, could be vps related?

Run: http://pulpito.ceph.com/teuthology-2015-01-29_17:05:02-upgrade:giant-x-next-distro-basic-vps/
Job: 730536
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-29_17:05:02-upgrade:giant-x-next-distro-basic-vps/730536/

Actions

Copy link

Updated by Samuel Just about 9 years ago

Status changed from New to Rejected

I'm going to declare this batch to be vps related.

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Same in
http://pulpito.ceph.com/teuthology-2015-02-03_17:05:01-upgrade:giant-x-next-distro-basic-vps/
['739080', '739085']

I am reopening this and assigning to devops as those seems persistent and the only difference is that we need to address those by changing/h/w/memory or else.

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Project changed from Ceph to devops
Status changed from Rejected to New

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Assignee set to Sage Weil

Sage, assigned to you for prioritization.

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Run: http://pulpito.ceph.com/teuthology-2015-02-22_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/
Job: 774602
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-02-22_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/774602/

Actions

Copy link

#10

Updated by Yuri Weinstein about 9 years ago

Run: http://pulpito.ceph.com/teuthology-2015-02-22_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/
Job: 774774
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-02-22_19:13:02-upgrade:dumpling-x-firefly-distro-basic-vps/774774/

Actions

Copy link

#11

Updated by Yuri Weinstein about 9 years ago

Subject changed from Failed "joining thrashosds" in upgrade:giant-x-next-distro-basic-vps run to Failed "joining thrashosds" - suspected low memory config on vps

Actions

Copy link

#12