Project

General

Profile

Actions

Bug #57449

closed

qa: removal of host during QA

Added by Patrick Donnelly over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
cephadm
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-08-31T17:11:51.057 INFO:journalctl@ceph.mon.c.smithi174.stdout:Aug 31 17:11:50 smithi174 ceph-mon[103875]: executing refresh((['smithi073', 'smithi153', 'smithi174'],)) failed.
2022-08-31T17:11:51.057 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                            Traceback (most recent call last):
2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                              File "/usr/share/ceph/mgr/cephadm/ssh.py", line 143, in _execute_command
2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                                r = await conn.run('sudo true', check=True, timeout=5)
2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                              File "/lib/python3.6/site-packages/asyncssh/connection.py", line 3637, in run
2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                                return await process.wait(check, timeout)
2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                              File "/lib/python3.6/site-packages/asyncssh/process.py", line 1252, in wait
2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                                stderr_data) from None
2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                            asyncssh.process.TimeoutError

/ceph/teuthology-archive/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/7002272/teuthology.log

and

2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping daemon refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping gather facts refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping network refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping device refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping osdspec preview refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping autotune
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm INFO cephadm.serve] Removing smithi073:/var/lib/ceph/4313329e-294d-11ed-8431-001a4aab830c/config/ceph.conf
2022-08-31T17:12:49.762+0000 7f75dc618700  0 log_channel(cephadm) log [INF] : Removing smithi073:/var/lib/ceph/4313329e-294d-11ed-8431-001a4aab830c/config/ceph.conf

/ceph/teuthology-archive/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/7002272/remote/smithi073/log/4313329e-294d-11ed-8431-001a4aab830c/ceph-mgr.x.log.gz

One interesting thing here is that the ceph-mgr is removing the host it's on! The remove smithi073 is the localhost.

See also a few more in

https://pulpito.ceph.com/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/

All with the "ffsb" workunit.
See also

/ceph/teuthology-archive/pdonnell-2022-09-02_01:35:33-fs:workload-wip-pdonnell-testing-20220901.192929-distro-default-smithi/7006546/teuthology.log


Related issues 2 (0 open2 closed)

Related to Orchestrator - Bug #56024: cephadm: removes ceph.conf during qa run causing command failureResolvedDhairya Parmar

Actions
Related to Orchestrator - Bug #56696: admin keyring disappears during qa runResolvedAdam King

Actions
Actions #1

Updated by Patrick Donnelly over 1 year ago

  • Description updated (diff)
Actions #2

Updated by Patrick Donnelly over 1 year ago

  • Related to Bug #56024: cephadm: removes ceph.conf during qa run causing command failure added
Actions #3

Updated by Adam King over 1 year ago

  • Related to Bug #56696: admin keyring disappears during qa run added
Actions #4

Updated by Adam King over 1 year ago

  • Translation missing: en.field_tag_list set to test-failure
  • Assignee set to Adam King

to be clear, it isn't removing the host itself here, just the ceph.conf on the host. If anything, it appears https://tracker.ceph.com/issues/56024 isn't actually resolved. I'm currently looking at https://tracker.ceph.com/issues/56696 and https://tracker.ceph.com/issues/57462 which I presume (but don't know 100% for sure yet), along with this tracker, are the same bug causing client keyrings/files to be removed when they shouldn't be. I'll update this one as well if I figure something out there.

Actions #5

Updated by Adam King over 1 year ago

  • Pull request ID set to 48074
Actions #6

Updated by Adam King over 1 year ago

  • Status changed from New to Resolved

marking this resolved despite it still needing backports. Backports will be tracked through https://tracker.ceph.com/issues/57462.

Actions

Also available in: Atom PDF