Project

General

Profile

Bug #53768

timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults workload/small-objects

Added by Joseph Sawaya about 1 year ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Error snippet:

2022-01-02T01:37:09.296 DEBUG:teuthology.orchestra.run.smithi086:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok dump_ops_in_flight
2022-01-02T01:37:09.410 INFO:teuthology.orchestra.run.smithi086.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
2022-01-02T01:37:09.413 DEBUG:teuthology.orchestra.run:got remote process result: 22
2022-01-02T01:37:09.413 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 581, in revive_osd
skip_admin_check=skip_admin_check)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 3021, in revive_osd
timeout=timeout, stdout=DEVNULL)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1963, in wait_run_admin_socket
id=service_id))
Exception: timed out waiting for admin_socket to appear after osd.2 restart

2022-01-02T01:37:09.414 ERROR:tasks.thrashosds.thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1280, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 581, in revive_osd
skip_admin_check=skip_admin_check)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 3021, in revive_osd
timeout=timeout, stdout=DEVNULL)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1963, in wait_run_admin_socket
id=service_id))
Exception: timed out waiting for admin_socket to appear after osd.2 restart
2022-01-02T01:37:09.743 INFO:tasks.ceph.osd.3.smithi086.stderr:INFO 2022-01-02 01:37:09,809 [shard 0] alienstore - stat
2022-01-02T01:37:09.814 INFO:tasks.ceph.osd.2.smithi086.stderr:INFO 2022-01-02 01:37:09,879 [shard 0] alienstore - stat
2022-01-02T01:37:09.901 INFO:tasks.ceph.osd.1.smithi074.stderr:INFO 2022-01-02 01:37:09,967 [shard 0] alienstore - stat
2022-01-02T01:37:10.194 INFO:tasks.ceph.osd.1.smithi074.stderr:ERROR 2022-01-02 01:37:10,259 [shard 0] ms - ms_dispatch unhandled message ping magic: 0 v1
2022-01-02T01:37:10.194 INFO:tasks.ceph.osd.3.smithi086.stderr:ERROR 2022-01-02 01:37:10,259 [shard 0] ms - ms_dispatch unhandled message ping magic: 0 v1
2022-01-02T01:37:10.459 DEBUG:teuthology.orchestra.run.smithi074:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd unset noscrub
2022-01-02T01:37:10.616 INFO:tasks.daemonwatchdog.daemon_watchdog:OSDThrasher failed
2022-01-02T01:37:10.617 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons


Full logs found here: http://qa-proxy.ceph.com/teuthology/teuthology-2022-01-02_01:01:03-crimson-rados-master-distro-default-smithi/6589721/teuthology.log

Looks like osd 2 is failing to restart correctly, seem to be some memory leaks later in the logs pertaining to PGs.

History

#1 Updated by Kamoltat (Junior) Sirivadhna 8 months ago

/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943791/

#2 Updated by Kamoltat (Junior) Sirivadhna 8 months ago

Hey Joseph what's the status on this?

#3 Updated by Kamoltat (Junior) Sirivadhna 8 months ago

/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6944338/

#4 Updated by Kamoltat (Junior) Sirivadhna 8 months ago

job dead hit max timeout but trace back suggests:

Exception: timed out waiting for admin_socket to appear after osd.2 restart

/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943718
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943741

#5 Updated by Joseph Sawaya 8 months ago

  • Status changed from In Progress to New
  • Assignee deleted (Joseph Sawaya)

#6 Updated by Kamoltat (Junior) Sirivadhna 8 months ago

  • Assignee set to Samuel Just

#7 Updated by Kamoltat (Junior) Sirivadhna 6 months ago

yuriw-2022-09-27_23:37:28-rados-wip-yuri2-testing-2022-09-27-1455-distro-default-smithi/7046234

#8 Updated by Samuel Just 6 months ago

  • Project changed from crimson to Ceph

Also available in: Atom PDF