Bug #57449
closedqa: removal of host during QA
0%
Description
2022-08-31T17:11:51.057 INFO:journalctl@ceph.mon.c.smithi174.stdout:Aug 31 17:11:50 smithi174 ceph-mon[103875]: executing refresh((['smithi073', 'smithi153', 'smithi174'],)) failed. 2022-08-31T17:11:51.057 INFO:journalctl@ceph.mon.c.smithi174.stdout: Traceback (most recent call last): 2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 143, in _execute_command 2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout: r = await conn.run('sudo true', check=True, timeout=5) 2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 3637, in run 2022-08-31T17:11:51.058 INFO:journalctl@ceph.mon.c.smithi174.stdout: return await process.wait(check, timeout) 2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout: File "/lib/python3.6/site-packages/asyncssh/process.py", line 1252, in wait 2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout: stderr_data) from None 2022-08-31T17:11:51.059 INFO:journalctl@ceph.mon.c.smithi174.stdout: asyncssh.process.TimeoutError
/ceph/teuthology-archive/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/7002272/teuthology.log
and
2022-08-31T17:12:49.762+0000 7f75dc618700 0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping daemon refresh 2022-08-31T17:12:49.762+0000 7f75dc618700 0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping gather facts refresh 2022-08-31T17:12:49.762+0000 7f75dc618700 0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping network refresh 2022-08-31T17:12:49.762+0000 7f75dc618700 0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping device refresh 2022-08-31T17:12:49.762+0000 7f75dc618700 0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping osdspec preview refresh 2022-08-31T17:12:49.762+0000 7f75dc618700 0 [cephadm DEBUG cephadm.inventory] Host "smithi073" marked as offline. Skipping autotune 2022-08-31T17:12:49.762+0000 7f75dc618700 0 [cephadm INFO cephadm.serve] Removing smithi073:/var/lib/ceph/4313329e-294d-11ed-8431-001a4aab830c/config/ceph.conf 2022-08-31T17:12:49.762+0000 7f75dc618700 0 log_channel(cephadm) log [INF] : Removing smithi073:/var/lib/ceph/4313329e-294d-11ed-8431-001a4aab830c/config/ceph.conf
/ceph/teuthology-archive/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/7002272/remote/smithi073/log/4313329e-294d-11ed-8431-001a4aab830c/ceph-mgr.x.log.gz
One interesting thing here is that the ceph-mgr is removing the host it's on! The remove smithi073 is the localhost.
See also a few more in
https://pulpito.ceph.com/pdonnell-2022-08-31_03:04:35-fs:workload-wip-pdonnell-testing-20220831.010330-distro-default-smithi/ See also
All with the "ffsb" workunit.
/ceph/teuthology-archive/pdonnell-2022-09-02_01:35:33-fs:workload-wip-pdonnell-testing-20220901.192929-distro-default-smithi/7006546/teuthology.log
Updated by Patrick Donnelly over 1 year ago
- Related to Bug #56024: cephadm: removes ceph.conf during qa run causing command failure added
Updated by Adam King over 1 year ago
- Related to Bug #56696: admin keyring disappears during qa run added
Updated by Adam King over 1 year ago
- Translation missing: en.field_tag_list set to test-failure
- Assignee set to Adam King
to be clear, it isn't removing the host itself here, just the ceph.conf on the host. If anything, it appears https://tracker.ceph.com/issues/56024 isn't actually resolved. I'm currently looking at https://tracker.ceph.com/issues/56696 and https://tracker.ceph.com/issues/57462 which I presume (but don't know 100% for sure yet), along with this tracker, are the same bug causing client keyrings/files to be removed when they shouldn't be. I'll update this one as well if I figure something out there.
Updated by Adam King over 1 year ago
- Status changed from New to Resolved
marking this resolved despite it still needing backports. Backports will be tracked through https://tracker.ceph.com/issues/57462.