Bug #56024: cephadm: removes ceph.conf during qa run causing command failure - Orchestrator - Ceph

Actions

Copy link

Bug #56024

closed

cephadm: removes ceph.conf during qa run causing command failure

Added by Patrick Donnelly almost 2 years ago. Updated almost 2 years ago.

Status:

Resolved

Priority:

High

Assignee:

Dhairya Parmar

Category:

Target version:

Ceph - v18.0.0

% Done:

Source:

Q/A

Tags:

Backport:

quincy,pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

46676

Crash signature (v1):

Crash signature (v2):

Description

2022-06-12T06:33:37.163 INFO:journalctl@ceph.mon.a.smithi008.stdout:Jun 12 06:33:36 smithi008 ceph-mon[31232]: Removing smithi008:/etc/ceph/ceph.conf
2022-06-12T06:33:37.164 INFO:journalctl@ceph.mon.a.smithi008.stdout:Jun 12 06:33:36 smithi008 ceph-mon[31232]: pgmap v516: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 174 KiB/s rd, 100 MiB/s wr, 4.93k op/s
2022-06-12T06:33:37.166 INFO:journalctl@ceph.mon.c.smithi185.stdout:Jun 12 06:33:36 smithi185 ceph-mon[38700]: Removing smithi008:/etc/ceph/ceph.conf
2022-06-12T06:33:37.166 INFO:journalctl@ceph.mon.c.smithi185.stdout:Jun 12 06:33:36 smithi185 ceph-mon[38700]: pgmap v516: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 174 KiB/s rd, 100 MiB/s wr, 4.93k op/s
2022-06-12T06:33:37.279 INFO:journalctl@ceph.mon.b.smithi157.stdout:Jun 12 06:33:36 smithi157 ceph-mon[37057]: Removing smithi008:/etc/ceph/ceph.conf
2022-06-12T06:33:37.280 INFO:journalctl@ceph.mon.b.smithi157.stdout:Jun 12 06:33:36 smithi157 ceph-mon[37057]: pgmap v516: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 174 KiB/s rd, 100 MiB/s wr, 4.93k op/s
2022-06-12T06:33:39.077 DEBUG:teuthology.orchestra.run.smithi008:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub
status
2022-06-12T06:33:39.165 INFO:journalctl@ceph.mon.a.smithi008.stdout:Jun 12 06:33:39 smithi008 ceph-mon[31232]: pgmap v517: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 279 KiB/s rd, 96 MiB/s wr, 4.66k op/s
2022-06-12T06:33:39.416 INFO:journalctl@ceph.mon.c.smithi185.stdout:Jun 12 06:33:39 smithi185 ceph-mon[38700]: pgmap v517: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 279 KiB/s rd, 96 MiB/s wr, 4.66k op/s
2022-06-12T06:33:39.527 INFO:journalctl@ceph.mon.b.smithi157.stdout:Jun 12 06:33:39 smithi157 ceph-mon[37057]: pgmap v517: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 279 KiB/s rd, 96 MiB/s wr, 4.66k op/s
2022-06-12T06:33:39.801 INFO:teuthology.orchestra.run.smithi008.stderr:Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)
2022-06-12T06:33:39.815 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-06-12T06:33:39.816 ERROR:tasks.fwd_scrub.fs.[cephfs]:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/fwd_scrub.py", line 38, in _run
    self.do_scrub()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/fwd_scrub.py", line 55, in do_scrub
    self._scrub()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/fwd_scrub.py", line 77, in _scrub
    timeout=self.scrub_timeout)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/cephfs/filesystem.py", line 1583, in wait_until_scrub_complete
    out_json = self.rank_tell(["scrub", "status"], rank=rank)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/cephfs/filesystem.py", line 1161, in rank_tell
    out = self.mon_manager.raw_cluster_cmd("tell", f"mds.{self.id}:{rank}", *command)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/ceph_manager.py", line 1597, in raw_cluster_cmd
    return self.run_cluster_cmd(**kwargs).stdout.getvalue()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/ceph_manager.py", line 1588, in run_cluster_cmd
    return self.controller.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_2290146eac7577b8500f128a53856c3ea4a00e3c/teuthology/orchestra/remote.py", line 510, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_2290146eac7577b8500f128a53856c3ea4a00e3c/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_2290146eac7577b8500f128a53856c3ea4a00e3c/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_2290146eac7577b8500f128a53856c3ea4a00e3c/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi008 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph
tell mds.1:0 scrub status'

From: /ceph/teuthology-archive/pdonnell-2022-06-12_05:08:12-fs:workload-wip-pdonnell-testing-20220612.004943-distro-default-smithi/6875276/teuthology.log

See also: /ceph/teuthology-archive/pdonnell-2022-06-12_05:08:12-fs:workload-wip-pdonnell-testing-20220612.004943-distro-default-smithi/6875321/teuthology.log

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Dhairya Parmar almost 2 years ago

Assignee set to Dhairya Parmar

Actions

Copy link

Updated by Dhairya Parmar almost 2 years ago

Pull request ID set to 46676

Actions

Copy link

Updated by Dhairya Parmar almost 2 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Dhairya Parmar almost 2 years ago

Status changed from In Progress to Fix Under Review

Actions

Copy link

Updated by Venky Shankar almost 2 years ago

Similar failure - https://pulpito.ceph.com/vshankar-2022-06-19_08:22:46-fs-wip-vshankar-testing1-20220619-102531-testing-default-smithi/6886664/

022-06-19T09:17:40.901 INFO:journalctl@ceph.mon.b.smithi145.stdout:Jun 19 09:17:40 smithi145 ceph-mon[121680]: from='mgr.14150 172.21.15.19:0/132793252' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "c"}]: dispatch
2022-06-19T09:17:40.907 DEBUG:teuthology.orchestra.run.smithi019:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub status
2022-06-19T09:17:41.419 INFO:journalctl@ceph.mon.a.smithi019.stdout:Jun 19 09:17:41 smithi019 ceph-mon[84572]: pgmap v800: 129 pgs: 129 active+clean; 704 MiB data, 4.5 GiB used, 1.0 TiB / 1.0 TiB avail; 2.8 MiB/s rd, 6.4 MiB/s wr, 436 op/s
2022-06-19T09:17:41.420 INFO:journalctl@ceph.mon.a.smithi019.stdout:Jun 19 09:17:41 smithi019 ceph-mon[84572]: from='mgr.14150 172.21.15.19:0/132793252' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "b"}]: dispatch
2022-06-19T09:17:41.439 INFO:teuthology.orchestra.run.smithi019.stderr:Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)

Actions

Copy link