Project

General

Profile

Bug #56696

admin keyring disappears during qa run

Added by Patrick Donnelly over 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
cephadm
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-07-22T21:02:39.723 INFO:journalctl@ceph.mon.b.smithi192.stdout:Jul 22 21:02:39 smithi192 ceph-mon[122238]: pgmap v1133: 129 pgs: 129 active+clean; 959 MiB data, 7.7 GiB used, 1.0 TiB / 1.0 TiB avail; 4.7 KiB/s rd, 682 B/s wr, 8 op/s
2022-07-22T21:02:40.130 WARNING:tasks.check_counter:Counter 'mds.exported' not found on daemon mds.d
2022-07-22T21:02:40.130 WARNING:tasks.check_counter:Counter 'mds.imported' not found on daemon mds.d
2022-07-22T21:02:40.131 DEBUG:tasks.check_counter:Getting stats from g
2022-07-22T21:02:40.131 DEBUG:teuthology.orchestra.run.smithi153:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:89768db311950607682ea2bb29f56edc324f86ac shell --fsid d77283b6-09fb-11ed-842f-001a4aab830c -- ceph daemon mds.g perf dump
2022-07-22T21:02:40.992 INFO:journalctl@ceph.mon.a.smithi153.stdout:Jul 22 21:02:40 smithi153 ceph-mon[85483]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x'
2022-07-22T21:02:40.993 INFO:journalctl@ceph.mon.a.smithi153.stdout:Jul 22 21:02:40 smithi153 ceph-mon[85483]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x'
2022-07-22T21:02:40.994 INFO:journalctl@ceph.mon.a.smithi153.stdout:Jul 22 21:02:40 smithi153 ceph-mon[85483]: pgmap v1134: 129 pgs: 129 active+clean; 959 MiB data, 7.7 GiB used, 1.0 TiB / 1.0 TiB avail; 5.2 KiB/s rd, 767 B/s wr, 9 op/s
2022-07-22T21:02:40.994 INFO:journalctl@ceph.mon.a.smithi153.stdout:Jul 22 21:02:40 smithi153 ceph-mon[85483]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x' cmd=[{"prefix": "mon
metadata", "id": "c"}]: dispatch
2022-07-22T21:02:40.995 INFO:journalctl@ceph.mon.a.smithi153.stdout:Jul 22 21:02:40 smithi153 ceph-mon[85483]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x' cmd=[{"prefix": "mon
metadata", "id": "b"}]: dispatch
2022-07-22T21:02:41.046 INFO:journalctl@ceph.mon.c.smithi201.stdout:Jul 22 21:02:40 smithi201 ceph-mon[122768]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x'
2022-07-22T21:02:41.047 INFO:journalctl@ceph.mon.c.smithi201.stdout:Jul 22 21:02:40 smithi201 ceph-mon[122768]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x'
2022-07-22T21:02:41.047 INFO:journalctl@ceph.mon.c.smithi201.stdout:Jul 22 21:02:40 smithi201 ceph-mon[122768]: pgmap v1134: 129 pgs: 129 active+clean; 959 MiB data, 7.7 GiB used, 1.0 TiB / 1.0 TiB avail; 5.2 KiB/s rd, 767 B/s wr, 9 op/s
2022-07-22T21:02:41.047 INFO:journalctl@ceph.mon.c.smithi201.stdout:Jul 22 21:02:40 smithi201 ceph-mon[122768]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "c"}]: dispatch
2022-07-22T21:02:41.047 INFO:journalctl@ceph.mon.c.smithi201.stdout:Jul 22 21:02:40 smithi201 ceph-mon[122768]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "b"}]: dispatch
2022-07-22T21:02:41.051 INFO:teuthology.orchestra.run.smithi153.stderr:Inferring config /var/lib/ceph/d77283b6-09fb-11ed-842f-001a4aab830c/mon.a/config
2022-07-22T21:02:41.184 INFO:journalctl@ceph.mon.b.smithi192.stdout:Jul 22 21:02:40 smithi192 ceph-mon[122238]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x'
2022-07-22T21:02:41.185 INFO:journalctl@ceph.mon.b.smithi192.stdout:Jul 22 21:02:40 smithi192 ceph-mon[122238]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x'
2022-07-22T21:02:41.185 INFO:journalctl@ceph.mon.b.smithi192.stdout:Jul 22 21:02:40 smithi192 ceph-mon[122238]: pgmap v1134: 129 pgs: 129 active+clean; 959 MiB data, 7.7 GiB used, 1.0 TiB / 1.0 TiB avail; 5.2 KiB/s rd, 767 B/s wr, 9 op/s
2022-07-22T21:02:41.185 INFO:journalctl@ceph.mon.b.smithi192.stdout:Jul 22 21:02:40 smithi192 ceph-mon[122238]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "c"}]: dispatch
2022-07-22T21:02:41.185 INFO:journalctl@ceph.mon.b.smithi192.stdout:Jul 22 21:02:40 smithi192 ceph-mon[122238]: from='mgr.14154 172.21.15.153:0/3969930003' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "b"}]: dispatch
2022-07-22T21:02:42.109 INFO:teuthology.orchestra.run.smithi153.stderr:Error: statfs /var/lib/ceph/d77283b6-09fb-11ed-842f-001a4aab830c/config/ceph.client.admin.keyring: no such file or directory
2022-07-22T21:02:42.157 DEBUG:teuthology.orchestra.run:got remote process result: 125
2022-07-22T21:02:42.158 ERROR:teuthology.run_tasks:Manager failed: check-counter
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_8d598431210977f8caccec83230b4bfec7bd5d3f/teuthology/run_tasks.py", line 188, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_8d598431210977f8caccec83230b4bfec7bd5d3f/teuthology/task/__init__.py", line 132, in __exit__
    self.end()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_89768db311950607682ea2bb29f56edc324f86ac/qa/tasks/check_counter.py", line 71, in end
    proc = manager.admin_socket(daemon_type, daemon_id, ["perf", "dump"])
  File "/home/teuthworker/src/git.ceph.com_ceph-c_89768db311950607682ea2bb29f56edc324f86ac/qa/tasks/ceph_manager.py", line 1849, in admin_socket
    check_status=check_status,
  File "/home/teuthworker/src/git.ceph.com_ceph-c_89768db311950607682ea2bb29f56edc324f86ac/qa/tasks/ceph_manager.py", line 52, in shell
    **kwargs
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_8d598431210977f8caccec83230b4bfec7bd5d3f/teuthology/orchestra/remote.py", line 510, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_8d598431210977f8caccec83230b4bfec7bd5d3f/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_8d598431210977f8caccec83230b4bfec7bd5d3f/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_8d598431210977f8caccec83230b4bfec7bd5d3f/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi153 with status 125: 'sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:89768db311950607682ea2bb29f56edc324f86ac shell --fsid d77283b6-09fb-11ed-842f-001a4aab830c -- ceph daemon mds.g perf dump'

From: /ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945820/teuthology.log

See also:

Failure: Command failed on smithi153 with status 125: 'sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:89768db311950607682ea2bb29f56edc324f86ac shell --fsid d77283b6-09fb-11ed-842f-001a4aab830c -- ceph daemon mds.g perf dump'
3 jobs: ['6945820', '6945835', '6945830']
suites intersection: ['1-cephadm', '2-logrotate}', 'begin/{0-install', 'clusters/1a11s-mds-1c-client-3node', 'conf/{client', 'fs/workload/{0-rhel_8', 'ignorelist_health', 'ignorelist_wrongly_marked_down', 'mds', 'mon', 'mount', 'mount/kclient/{base/{mount-syntax/{v2}', 'ms-die-on-skipped}}', 'osd-asserts', 'osd}', 'overrides/{frag', 'session_timeout}', 'standby-replay', 'tasks/{0-check-counter']
suites union: ['1-cephadm', '2-logrotate}', 'begin/{0-install', 'clusters/1a11s-mds-1c-client-3node', 'conf/{client', 'fs/workload/{0-rhel_8', 'ignorelist_health', 'ignorelist_wrongly_marked_down', 'mds', 'mon', 'mount', 'mount/kclient/{base/{mount-syntax/{v2}', 'ms-die-on-skipped}}', 'ms_mode/legacy', 'ms_mode/secure', 'n/5', 'objectstore-ec/bluestore-bitmap', 'objectstore-ec/bluestore-ec-root', 'omap_limit/10', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{distro/stock/{k-stock', 'overrides/{distro/testing/k-testing', 'overrides/{frag', 'ranks/1', 'ranks/multi/{export-check', 'replication/default}', 'rhel_8}', 'scrub/no', 'scrub/yes', 'session_timeout}', 'standby-replay', 'subvolume/{no-subvolume}', 'subvolume/{with-namespace-isolated}', 'tasks/{0-check-counter', 'workunit/fs/misc}}', 'workunit/suites/dbench}}', 'workunit/suites/fsync-tester}}', 'wsync/no}', 'wsync/yes}']

Related issues

Related to Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight Resolved
Related to Orchestrator - Bug #57449: qa: removal of host during QA Resolved

History

#1 Updated by Adam King over 1 year ago

  • Assignee set to Adam King

#2 Updated by Adam King over 1 year ago

  • Related to Bug #57462: cephadm removes config & keyring files in mid flight added

#3 Updated by Adam King over 1 year ago

  • Related to Bug #57449: qa: removal of host during QA added

#4 Updated by Adam King over 1 year ago

  • Pull request ID set to 48074

#5 Updated by Adam King over 1 year ago

  • Status changed from New to Resolved

marking this resolved despite it still needing backports. Backports of this will be tracked through https://tracker.ceph.com/issues/57462.

#6 Updated by Voja Molani 12 months ago

The PR has a comment:

Keep in mind so far this issue has only been seen in long running tests

But I have seen this several times (maybe 5-6?) in two not yet in production clusters initially EL8 and then reinstalled as EL9 that have existed for about 6 months. One cluster is VMs and second is physical machines.
It will be interesting to see if 17.2.6 finally fixes this.

The latest incidence happened just now when a third node was being processed exactly the same way, with Ansible, as the two hosts before it:
  1. ceph orch host maintenance enter x
  2. reboot server
  3. ceph orch host maintenance exit x

The third server threw error at host maintenance exit that the keyring was not found and indeed it was missing.

Also available in: Atom PDF