Project

General

Profile

Bug #56024

cephadm: removes ceph.conf during qa run causing command failure

Added by Patrick Donnelly 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-06-12T06:33:37.163 INFO:journalctl@ceph.mon.a.smithi008.stdout:Jun 12 06:33:36 smithi008 ceph-mon[31232]: Removing smithi008:/etc/ceph/ceph.conf
2022-06-12T06:33:37.164 INFO:journalctl@ceph.mon.a.smithi008.stdout:Jun 12 06:33:36 smithi008 ceph-mon[31232]: pgmap v516: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 174 KiB/s rd, 100 MiB/s wr, 4.93k op/s
2022-06-12T06:33:37.166 INFO:journalctl@ceph.mon.c.smithi185.stdout:Jun 12 06:33:36 smithi185 ceph-mon[38700]: Removing smithi008:/etc/ceph/ceph.conf
2022-06-12T06:33:37.166 INFO:journalctl@ceph.mon.c.smithi185.stdout:Jun 12 06:33:36 smithi185 ceph-mon[38700]: pgmap v516: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 174 KiB/s rd, 100 MiB/s wr, 4.93k op/s
2022-06-12T06:33:37.279 INFO:journalctl@ceph.mon.b.smithi157.stdout:Jun 12 06:33:36 smithi157 ceph-mon[37057]: Removing smithi008:/etc/ceph/ceph.conf
2022-06-12T06:33:37.280 INFO:journalctl@ceph.mon.b.smithi157.stdout:Jun 12 06:33:36 smithi157 ceph-mon[37057]: pgmap v516: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 174 KiB/s rd, 100 MiB/s wr, 4.93k op/s
2022-06-12T06:33:39.077 DEBUG:teuthology.orchestra.run.smithi008:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub
status
2022-06-12T06:33:39.165 INFO:journalctl@ceph.mon.a.smithi008.stdout:Jun 12 06:33:39 smithi008 ceph-mon[31232]: pgmap v517: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 279 KiB/s rd, 96 MiB/s wr, 4.66k op/s
2022-06-12T06:33:39.416 INFO:journalctl@ceph.mon.c.smithi185.stdout:Jun 12 06:33:39 smithi185 ceph-mon[38700]: pgmap v517: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 279 KiB/s rd, 96 MiB/s wr, 4.66k op/s
2022-06-12T06:33:39.527 INFO:journalctl@ceph.mon.b.smithi157.stdout:Jun 12 06:33:39 smithi157 ceph-mon[37057]: pgmap v517: 129 pgs: 129 active+clean; 13 GiB data, 28 GiB used, 1.0 TiB / 1.0 TiB avail; 279 KiB/s rd, 96 MiB/s wr, 4.66k op/s
2022-06-12T06:33:39.801 INFO:teuthology.orchestra.run.smithi008.stderr:Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)
2022-06-12T06:33:39.815 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-06-12T06:33:39.816 ERROR:tasks.fwd_scrub.fs.[cephfs]:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/fwd_scrub.py", line 38, in _run
    self.do_scrub()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/fwd_scrub.py", line 55, in do_scrub
    self._scrub()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/fwd_scrub.py", line 77, in _scrub
    timeout=self.scrub_timeout)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/cephfs/filesystem.py", line 1583, in wait_until_scrub_complete
    out_json = self.rank_tell(["scrub", "status"], rank=rank)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/cephfs/filesystem.py", line 1161, in rank_tell
    out = self.mon_manager.raw_cluster_cmd("tell", f"mds.{self.id}:{rank}", *command)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/ceph_manager.py", line 1597, in raw_cluster_cmd
    return self.run_cluster_cmd(**kwargs).stdout.getvalue()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_36d24a7f39b7e565955f208f512d14b9d7e923ee/qa/tasks/ceph_manager.py", line 1588, in run_cluster_cmd
    return self.controller.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_2290146eac7577b8500f128a53856c3ea4a00e3c/teuthology/orchestra/remote.py", line 510, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_2290146eac7577b8500f128a53856c3ea4a00e3c/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_2290146eac7577b8500f128a53856c3ea4a00e3c/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_2290146eac7577b8500f128a53856c3ea4a00e3c/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi008 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph
tell mds.1:0 scrub status'

From: /ceph/teuthology-archive/pdonnell-2022-06-12_05:08:12-fs:workload-wip-pdonnell-testing-20220612.004943-distro-default-smithi/6875276/teuthology.log

See also: /ceph/teuthology-archive/pdonnell-2022-06-12_05:08:12-fs:workload-wip-pdonnell-testing-20220612.004943-distro-default-smithi/6875321/teuthology.log


Related issues

Related to Orchestrator - Bug #57449: qa: removal of host during QA New
Copied to Orchestrator - Backport #56473: pacific: cephadm: removes ceph.conf during qa run causing command failure Resolved
Copied to Orchestrator - Backport #56474: quincy: cephadm: removes ceph.conf during qa run causing command failure Resolved

History

#1 Updated by Dhairya Parmar 6 months ago

  • Assignee set to Dhairya Parmar

#2 Updated by Dhairya Parmar 6 months ago

  • Pull request ID set to 46676

#3 Updated by Dhairya Parmar 6 months ago

  • Status changed from New to In Progress

#4 Updated by Dhairya Parmar 6 months ago

  • Status changed from In Progress to Fix Under Review

#5 Updated by Venky Shankar 6 months ago

Similar failure - https://pulpito.ceph.com/vshankar-2022-06-19_08:22:46-fs-wip-vshankar-testing1-20220619-102531-testing-default-smithi/6886664/

022-06-19T09:17:40.901 INFO:journalctl@ceph.mon.b.smithi145.stdout:Jun 19 09:17:40 smithi145 ceph-mon[121680]: from='mgr.14150 172.21.15.19:0/132793252' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "c"}]: dispatch
2022-06-19T09:17:40.907 DEBUG:teuthology.orchestra.run.smithi019:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub status
2022-06-19T09:17:41.419 INFO:journalctl@ceph.mon.a.smithi019.stdout:Jun 19 09:17:41 smithi019 ceph-mon[84572]: pgmap v800: 129 pgs: 129 active+clean; 704 MiB data, 4.5 GiB used, 1.0 TiB / 1.0 TiB avail; 2.8 MiB/s rd, 6.4 MiB/s wr, 436 op/s
2022-06-19T09:17:41.420 INFO:journalctl@ceph.mon.a.smithi019.stdout:Jun 19 09:17:41 smithi019 ceph-mon[84572]: from='mgr.14150 172.21.15.19:0/132793252' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "b"}]: dispatch
2022-06-19T09:17:41.439 INFO:teuthology.orchestra.run.smithi019.stderr:Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)

#6 Updated by Adam King 5 months ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Backport Bot 5 months ago

  • Copied to Backport #56473: pacific: cephadm: removes ceph.conf during qa run causing command failure added

#8 Updated by Backport Bot 5 months ago

  • Copied to Backport #56474: quincy: cephadm: removes ceph.conf during qa run causing command failure added

#9 Updated by Adam King 5 months ago

  • Status changed from Pending Backport to Resolved

#10 Updated by Patrick Donnelly 3 months ago

  • Related to Bug #57449: qa: removal of host during QA added

Also available in: Atom PDF