Project

General

Profile

Actions

Bug #53768

closed

timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults workload/small-objects

Added by Joseph Sawaya over 2 years ago. Updated 1 day ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Error snippet:

2022-01-02T01:37:09.296 DEBUG:teuthology.orchestra.run.smithi086:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok dump_ops_in_flight
2022-01-02T01:37:09.410 INFO:teuthology.orchestra.run.smithi086.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
2022-01-02T01:37:09.413 DEBUG:teuthology.orchestra.run:got remote process result: 22
2022-01-02T01:37:09.413 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 581, in revive_osd
skip_admin_check=skip_admin_check)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 3021, in revive_osd
timeout=timeout, stdout=DEVNULL)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1963, in wait_run_admin_socket
id=service_id))
Exception: timed out waiting for admin_socket to appear after osd.2 restart

2022-01-02T01:37:09.414 ERROR:tasks.thrashosds.thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1280, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 581, in revive_osd
skip_admin_check=skip_admin_check)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 3021, in revive_osd
timeout=timeout, stdout=DEVNULL)
File "/home/teuthworker/src/github.com_ceph_ceph_a2f5a3c1dbfa4dce41e25da4f029a8fdb8c8d864/qa/tasks/ceph_manager.py", line 1963, in wait_run_admin_socket
id=service_id))
Exception: timed out waiting for admin_socket to appear after osd.2 restart
2022-01-02T01:37:09.743 INFO:tasks.ceph.osd.3.smithi086.stderr:INFO 2022-01-02 01:37:09,809 [shard 0] alienstore - stat
2022-01-02T01:37:09.814 INFO:tasks.ceph.osd.2.smithi086.stderr:INFO 2022-01-02 01:37:09,879 [shard 0] alienstore - stat
2022-01-02T01:37:09.901 INFO:tasks.ceph.osd.1.smithi074.stderr:INFO 2022-01-02 01:37:09,967 [shard 0] alienstore - stat
2022-01-02T01:37:10.194 INFO:tasks.ceph.osd.1.smithi074.stderr:ERROR 2022-01-02 01:37:10,259 [shard 0] ms - ms_dispatch unhandled message ping magic: 0 v1
2022-01-02T01:37:10.194 INFO:tasks.ceph.osd.3.smithi086.stderr:ERROR 2022-01-02 01:37:10,259 [shard 0] ms - ms_dispatch unhandled message ping magic: 0 v1
2022-01-02T01:37:10.459 DEBUG:teuthology.orchestra.run.smithi074:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd unset noscrub
2022-01-02T01:37:10.616 INFO:tasks.daemonwatchdog.daemon_watchdog:OSDThrasher failed
2022-01-02T01:37:10.617 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons


Full logs found here: http://qa-proxy.ceph.com/teuthology/teuthology-2022-01-02_01:01:03-crimson-rados-master-distro-default-smithi/6589721/teuthology.log

Looks like osd 2 is failing to restart correctly, seem to be some memory leaks later in the logs pertaining to PGs.

Actions #1

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943791/

Actions #2

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

Hey Joseph what's the status on this?

Actions #3

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6944338/

Actions #4

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

job dead hit max timeout but trace back suggests:

Exception: timed out waiting for admin_socket to appear after osd.2 restart

/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943718
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943741

Actions #5

Updated by Joseph Sawaya over 1 year ago

  • Status changed from In Progress to New
  • Assignee deleted (Joseph Sawaya)
Actions #6

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

  • Assignee set to Samuel Just
Actions #7

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

yuriw-2022-09-27_23:37:28-rados-wip-yuri2-testing-2022-09-27-1455-distro-default-smithi/7046234

Actions #8

Updated by Samuel Just over 1 year ago

  • Project changed from crimson to Ceph
Actions #9

Updated by Laura Flores 8 months ago

  • Translation missing: en.field_tag_list set to test-failure

/a/yuriw-2023-08-10_20:19:11-rados-wip-yuri2-testing-2023-08-08-0755-pacific-distro-default-smithi/7366113

This one happened with workloads/radosbench.

Actions #10

Updated by Laura Flores 8 months ago

  • Backport set to pacific
Actions #11

Updated by Laura Flores 8 months ago

  • Project changed from Ceph to RADOS
Actions #12

Updated by Laura Flores 8 days ago

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652505

Actions #13

Updated by Samuel Just 7 days ago

@Laura Flores There's little chance that the above crash is related to what Joseph saw here, let's close this one and open a new bug.

Actions #14

Updated by Laura Flores 7 days ago

Laura Flores wrote in #note-12:

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652505

Samuel Just wrote in #note-13:

@Laura Flores There's little chance that the above crash is related to what Joseph saw here, let's close this one and open a new bug.

I created a new tracker here: https://tracker.ceph.com/issues/65557

Actions #15

Updated by Radoslaw Zarzynski 1 day ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF