Project

General

Profile

Actions

Bug #53768

closed

timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults workload/small-objects

Added by Joseph Sawaya over 2 years ago. Updated 13 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Error snippet:

2022-01-02T01:37:09.296 DEBUG:teuthology.orchestra.run.smithi086:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok dump_ops_in_flight
2022-01-02T01:37:09.410 INFO:teuthology.orchestra.run.smithi086.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
2022-01-02T01:37:09.413 DEBUG:teuthology.orchestra.run:got remote process result: 22
2022-01-02T01:37:09.413 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 581, in revive_osd
skip_admin_check=skip_admin_check)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 3021, in revive_osd
timeout=timeout, stdout=DEVNULL)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1963, in wait_run_admin_socket
id=service_id))
Exception: timed out waiting for admin_socket to appear after osd.2 restart

2022-01-02T01:37:09.414 ERROR:tasks.thrashosds.thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1280, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 581, in revive_osd
skip_admin_check=skip_admin_check)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 3021, in revive_osd
timeout=timeout, stdout=DEVNULL)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1963, in wait_run_admin_socket
id=service_id))
Exception: timed out waiting for admin_socket to appear after osd.2 restart
2022-01-02T01:37:09.743 INFO:tasks.ceph.osd.3.smithi086.stderr:INFO 2022-01-02 01:37:09,809 [shard 0] alienstore - stat
2022-01-02T01:37:09.814 INFO:tasks.ceph.osd.2.smithi086.stderr:INFO 2022-01-02 01:37:09,879 [shard 0] alienstore - stat
2022-01-02T01:37:09.901 INFO:tasks.ceph.osd.1.smithi074.stderr:INFO 2022-01-02 01:37:09,967 [shard 0] alienstore - stat
2022-01-02T01:37:10.194 INFO:tasks.ceph.osd.1.smithi074.stderr:ERROR 2022-01-02 01:37:10,259 [shard 0] ms - ms_dispatch unhandled message ping magic: 0 v1
2022-01-02T01:37:10.194 INFO:tasks.ceph.osd.3.smithi086.stderr:ERROR 2022-01-02 01:37:10,259 [shard 0] ms - ms_dispatch unhandled message ping magic: 0 v1
2022-01-02T01:37:10.459 DEBUG:teuthology.orchestra.run.smithi074:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd unset noscrub
2022-01-02T01:37:10.616 INFO:tasks.daemonwatchdog.daemon_watchdog:OSDThrasher failed
2022-01-02T01:37:10.617 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons


Full logs found here: http://qa-proxy.ceph.com/teuthology/teuthology-2022-01-02_01:01:03-crimson-rados-master-distro-default-smithi/6589721/teuthology.log

Looks like osd 2 is failing to restart correctly, seem to be some memory leaks later in the logs pertaining to PGs.

Actions

Also available in: Atom PDF