Project

General

Profile

Actions

Bug #61787

open

Command "ceph --cluster ceph osd dump --format=json" times out when killing OSD

Added by Laura Flores 11 months ago. Updated 11 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description: rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/default/{default thrashosds-health} mon_election/connectivity msgr-failures/few msgr/async-v2only objectstore/bluestore-low-osd-mem-target rados tasks/rados_api_tests validater/valgrind}

/a/yuriw-2023-06-22_20:09:40-rados-wip-yuri6-testing-2023-06-22-1005-quincy-distro-default-smithi/7312692

2023-06-23T05:41:02.129 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.126+0000 7f7cf3ac7700  1 --2- 172.21.15.78:0/3607827062 >> [v2:172.21.15.150:3300/0,v1:172.21.15.150:6789/0] conn(0x7f7cec151b60 0x7f7cec14ddf0 secure :-1 s=READY pgs=7169 cs=0 l=1 rev1=1 crypto rx=0x7f7ce400cbc0 tx=0x7f7ce4001960 comp rx=0 tx=0).stop
2023-06-23T05:41:02.130 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 -- 172.21.15.78:0/3607827062 shutdown_connections
2023-06-23T05:41:02.130 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 --2- 172.21.15.78:0/3607827062 >> v2:172.21.15.78:6800/111184 conn(0x7f7cd0068820 0x7f7cd006acd0 unknown :-1 s=CLOSED pgs=1480 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).stop
2023-06-23T05:41:02.130 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 --2- 172.21.15.78:0/3607827062 >> [v2:172.21.15.150:3300/0,v1:172.21.15.150:6789/0] conn(0x7f7cec151b60 0x7f7cec14ddf0 unknown :-1 s=CLOSED pgs=7169 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).stop
2023-06-23T05:41:02.130 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 --2- 172.21.15.78:0/3607827062 >> [v2:172.21.15.78:3301/0,v1:172.21.15.78:6790/0] conn(0x7f7cec0a5720 0x7f7cec13c070 unknown :-1 s=CLOSED pgs=0 cs=0 l=1 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).stop
2023-06-23T05:41:02.131 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 --2- 172.21.15.78:0/3607827062 >> [v2:172.21.15.78:3300/0,v1:172.21.15.78:6789/0] conn(0x7f7cec141640 0x7f7cec0ae260 unknown :-1 s=CLOSED pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).stop
2023-06-23T05:41:02.131 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 -- 172.21.15.78:0/3607827062 >> 172.21.15.78:0/3607827062 conn(0x7f7cec06a2b0 msgr2=0x7f7cec0b5030 unknown :-1 s=STATE_NONE l=0).mark_down
2023-06-23T05:41:02.131 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 -- 172.21.15.78:0/3607827062 shutdown_connections
2023-06-23T05:41:02.131 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 -- 172.21.15.78:0/3607827062 wait complete.
2023-06-23T05:41:02.131 INFO:teuthology.orchestra.run.smithi078.stderr:2023-06-23T05:41:02.127+0000 7f7cf3ac7700  1 librados: shutdown
2023-06-23T05:41:02.131 INFO:teuthology.orchestra.run.smithi078.stderr:nodeep-scrub is unset
2023-06-23T05:41:05.863 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_e6c50e4e4b8a8d449b864060ef3b121b5791e41b/qa/tasks/ceph_manager.py", line 198, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_e6c50e4e4b8a8d449b864060ef3b121b5791e41b/qa/tasks/ceph_manager.py", line 1430, in _do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_e6c50e4e4b8a8d449b864060ef3b121b5791e41b/qa/tasks/ceph_manager.py", line 356, in kill_osd
    self.ceph_manager.kill_osd(osd)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_e6c50e4e4b8a8d449b864060ef3b121b5791e41b/qa/tasks/ceph_manager.py", line 3018, in kill_osd
    self.ctx.daemons.get_daemon('osd', osd, self.cluster).stop()
  File "/home/teuthworker/src/git.ceph.com_teuthology_076bbebc42a14f7d568aaa78eabb0038327bcb23/teuthology/orchestra/daemon/state.py", line 139, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_teuthology_076bbebc42a14f7d568aaa78eabb0038327bcb23/teuthology/orchestra/run.py", line 473, in wait
    check_time()
  File "/home/teuthworker/src/git.ceph.com_teuthology_076bbebc42a14f7d568aaa78eabb0038327bcb23/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (51) after waiting for 300 seconds

I didn't find any valgrind memory leaks or test failures in the logs.

Actions #1

Updated by Radoslaw Zarzynski 11 months ago

In the OSD.6's log:

rzarzynski@teuthology:/a/yuriw-2023-06-22_20:09:40-rados-wip-yuri6-testing-2023-06-22-1005-quincy-distro-default-smithi/7312692$ less ./remote/smithi150/log/ceph-osd.6.log.gz
...
2023-06-23T05:41:19.814+0000 ee8a700  0 osd.6 0 Slow Shutdown duration:6.894987 seconds

We know the OSDs are run under Valgrind. The question is how much it contributes to the overall slowness.

Actions

Also available in: Atom PDF