Project

General

Profile

Bug #57449

qa: removal of host during QA

Added by Patrick Donnelly 3 months ago. Updated 3 months ago.

Status:
New
Priority:
High
Assignee:
Category:
cephadm
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-08-31T17:11:51.057 INFO:journalctl@ceph.mon.c.smithi174.stdout:Aug 31 17:11:50 smithi174 ceph-mon[103875]: executing refresh((['smithi073', 'smithi153', 'smithi174'],)) failed.
2022-08-31T17:11:51.057 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                            Traceback (most recent call last):
2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                              File "/usr/share/ceph/mgr/cephadm/ssh.py", line 143, in _execute_command
2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                                r = await conn.run('sudo true', check=True, timeout=5)
2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                              File "/lib/python3.6/site-packages/asyncssh/connection.py", line 3637, in run
2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                                return await process.wait(check, timeout)
2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                              File "/lib/python3.6/site-packages/asyncssh/process.py", line 1252, in wait
2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                                stderr_data) from None
2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout:                                            asyncssh.process.TimeoutError

/ceph/teuthology-archive/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/7002272/teuthology.log

and

2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping daemon refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping gather facts refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping network refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping device refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping osdspec preview refresh
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping autotune
2022-08-31T17:12:49.762+0000 7f75dc618700  0 [cephadm INFO cephadm.serve] Removing smithi073:/var/lib/ceph/4313329e-294d-11ed-8431-001a4aab830c/config/ceph.conf
2022-08-31T17:12:49.762+0000 7f75dc618700  0 log_channel(cephadm) log [INF] : Removing smithi073:/var/lib/ceph/4313329e-294d-11ed-8431-001a4aab830c/config/ceph.conf

/ceph/teuthology-archive/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/7002272/remote/smithi073/log/4313329e-294d-11ed-8431-001a4aab830c/ceph-mgr.x.log.gz

One interesting thing here is that the ceph-mgr is removing the host it's on! The remove smithi073 is the localhost.

See also a few more in

https://pulpito.ceph.com/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/

All with the "ffsb" workunit.
See also

/ceph/teuthology-archive/pdonnell-2022-09-02_01:35:33-fs:workload-wip-pdonnell-testing-20220901.192929-distro-default-smithi/7006546/teuthology.log


Related issues

Related to Orchestrator - Bug #56024: cephadm: removes ceph.conf during qa run causing command failure Resolved
Related to Orchestrator - Bug #56696: admin keyring disappears during qa run New

History

#1 Updated by Patrick Donnelly 3 months ago

  • Description updated (diff)

#2 Updated by Patrick Donnelly 3 months ago

  • Related to Bug #56024: cephadm: removes ceph.conf during qa run causing command failure added

#3 Updated by Adam King 3 months ago

  • Related to Bug #56696: admin keyring disappears during qa run added

#4 Updated by Adam King 3 months ago

  • Tags set to test-failure
  • Assignee set to Adam King

to be clear, it isn't removing the host itself here, just the ceph.conf on the host. If anything, it appears https://tracker.ceph.com/issues/56024 isn't actually resolved. I'm currently looking at https://tracker.ceph.com/issues/56696 and https://tracker.ceph.com/issues/57462 which I presume (but don't know 100% for sure yet), along with this tracker, are the same bug causing client keyrings/files to be removed when they shouldn't be. I'll update this one as well if I figure something out there.

Also available in: Atom PDF