Project

General

Profile

Actions

Bug #52562

closed

Thrashosds read error injection failed with error ENXIO

Added by Sridhar Seshasayee over 2 years ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
test-failure
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379886

As part of the thrashosds test, the osds that are out/dead are revived. As part of this, osd.0 was revived from the dead state. After the daemon was restarted, the command to inject read error injection failed with error ENXIO:

2021-09-08T15:47:42.406 INFO:tasks.thrashosds.thrasher:reviving osd
2021-09-08T15:47:42.406 INFO:tasks.thrashosds.thrasher:Reviving osd 0
2021-09-08T15:47:42.407 INFO:tasks.ceph.osd.0:Restarting daemon
2021-09-08T15:47:42.407 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f --cluster ceph -i 0
2021-09-08T15:47:42.410 INFO:tasks.ceph.osd.0:Started
2021-09-08T15:47:42.410 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight
2021-09-08T15:47:42.531 INFO:teuthology.orchestra.run.smithi064.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
2021-09-08T15:47:42.533 DEBUG:teuthology.orchestra.run:got remote process result: 22
2021-09-08T15:47:42.533 INFO:tasks.ceph.ceph_manager.ceph:waiting on admin_socket for osd-0, ['dump_ops_in_flight']
2021-09-08T15:47:43.076 INFO:tasks.ceph.osd.0.smithi064.stderr:2021-09-08T15:47:43.074+0000 7f36982490c0 -1 Falling back to public interface
2021-09-08T15:47:46.993 INFO:tasks.ceph.osd.0.smithi064.stderr:2021-09-08T15:47:46.992+0000 7f36982490c0 -1 osd.0 244 log_to_monitors {default=true}
2021-09-08T15:47:47.344 INFO:tasks.ceph.osd.0.smithi064.stderr:2021-09-08T15:47:47.341+0000 7f368959e700 -1 osd.0 301 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
2021-09-08T15:47:47.534 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight
2021-09-08T15:47:47.663 INFO:teuthology.orchestra.run.smithi064.stdout:{
2021-09-08T15:47:47.663 INFO:teuthology.orchestra.run.smithi064.stdout:    "ops": [],
2021-09-08T15:47:47.663 INFO:teuthology.orchestra.run.smithi064.stdout:    "num_ops": 0
2021-09-08T15:47:47.663 INFO:teuthology.orchestra.run.smithi064.stdout:}
2021-09-08T15:47:47.673 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config set filestore_debug_random_read_err 0.33
2021-09-08T15:47:47.798 INFO:teuthology.orchestra.run.smithi064.stdout:{
2021-09-08T15:47:47.799 INFO:teuthology.orchestra.run.smithi064.stdout:    "success": "filestore_debug_random_read_err = '0.330000' (not observed, change may require restart) " 
2021-09-08T15:47:47.799 INFO:teuthology.orchestra.run.smithi064.stdout:}
2021-09-08T15:47:47.809 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 0 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config set bluestore_debug_random_read_err 0.33
2021-09-08T15:47:47.935 INFO:teuthology.orchestra.run.smithi064.stdout:{
2021-09-08T15:47:47.936 INFO:teuthology.orchestra.run.smithi064.stdout:    "success": "bluestore_debug_random_read_err = '0.330000' (not observed, change may require restart) " 
2021-09-08T15:47:47.936 INFO:teuthology.orchestra.run.smithi064.stdout:}
2021-09-08T15:47:47.945 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph -- tell osd.0 injectargs --filestore_debug_random_read_err=0.0
2021-09-08T15:47:48.067 INFO:teuthology.orchestra.run.smithi064.stderr:Error ENXIO: problem getting command descriptions from osd.0
2021-09-08T15:47:48.069 DEBUG:teuthology.orchestra.run:got remote process result: 6
2021-09-08T15:47:48.070 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5091269d28a67ed0674391279081ff3cac6e91e1/qa/tasks/ceph_manager.py", line 188, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5091269d28a67ed0674391279081ff3cac6e91e1/qa/tasks/ceph_manager.py", line 1416, in _do_thrash
    'filestore_debug_random_read_err', '0.0')
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5091269d28a67ed0674391279081ff3cac6e91e1/qa/tasks/ceph_manager.py", line 1990, in inject_args
    self.raw_cluster_cmd('--', 'tell', whom, 'injectargs', opt_arg)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5091269d28a67ed0674391279081ff3cac6e91e1/qa/tasks/ceph_manager.py", line 1596, in raw_cluster_cmd
    return self.run_cluster_cmd(**kwargs).stdout.getvalue()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5091269d28a67ed0674391279081ff3cac6e91e1/qa/tasks/ceph_manager.py", line 1587, in run_cluster_cmd
    return self.controller.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b135a4f82b02b966053efba0501e6a47c9d2d793/teuthology/orchestra/remote.py", line 509, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b135a4f82b02b966053efba0501e6a47c9d2d793/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b135a4f82b02b966053efba0501e6a47c9d2d793/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b135a4f82b02b966053efba0501e6a47c9d2d793/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi064 with status 6: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph -- tell osd.0 injectargs --filestore_debug_random_read_err=0.0'

There were fixes made in the test as part of https://tracker.ceph.com/issues/21206 (quite a while ago), but this seems to be different and related to the time it took for osd.0 to become active.


Related issues 1 (1 open0 closed)

Related to RADOS - Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.1 flush_pg_stats`Fix Under ReviewNitzan Mordechai

Actions
Actions

Also available in: Atom PDF