Project

General

Profile

Bug #48223

host key might not exist during host refresh

Added by Michael Fritch over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ssh communication failures during a cephadm host add/remove can break the serve loop:

2020-11-12T17:26:38.804-0700 7f0bafdd4640  0 log_channel(audit) log [DBG] : from='client.4943 -' entity='client.admin' cmd=[{"prefix": "orch host rm", "hostname": "host1", "target": ["mon-mgr", ""]}]: dispatch
2020-11-12T17:26:39.788-0700 7f0bad58f640  0 [cephadm ERROR cephadm.utils] executing refresh((['host1'],)) failed.
Traceback (most recent call last):
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/utils.py", line 62, in do_work
    return f(*arg)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 119, in refresh
    r = self._refresh_host_daemons(host)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 246, in _refresh_host_daemons
    self.mgr.cache.save_host(host)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/inventory.py", line 308, in save_host
    for d in self.devices[host]:
KeyError: 'host1'
2020-11-12T17:26:39.788-0700 7f0bad58f640 -1 log_channel(cephadm) log [ERR] : executing refresh((['host1'],)) failed.
Traceback (most recent call last):
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/utils.py", line 62, in do_work
    return f(*arg)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 119, in refresh
    r = self._refresh_host_daemons(host)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 246, in _refresh_host_daemons
    self.mgr.cache.save_host(host)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/inventory.py", line 308, in save_host
    for d in self.devices[host]:
KeyError: 'host1'
2020-11-12T17:26:39.788-0700 7f0ba6a42640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'cephadm' while running on mgr.x: 'host1'
2020-11-12T17:26:39.788-0700 7f0ba6a42640 -1 cephadm.serve:
2020-11-12T17:26:39.788-0700 7f0ba6a42640 -1 Traceback (most recent call last):
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/module.py", line 441, in serve
    serve.serve()
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 59, in serve
    self._refresh_hosts_and_daemons()
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 148, in _refresh_hosts_and_daemons
    refresh(self.mgr.cache.get_hosts())
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/utils.py", line 68, in forall_hosts_wrapper
    return CephadmOrchestrator.instance._worker_pool.map(do_work, vals)
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/utils.py", line 62, in do_work
    return f(*arg)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 119, in refresh
    r = self._refresh_host_daemons(host)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 246, in _refresh_host_daemons
    self.mgr.cache.save_host(host)
  File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/inventory.py", line 308, in save_host
    for d in self.devices[host]:
KeyError: 'host1'

History

#1 Updated by Michael Fritch over 3 years ago

2020-11-11T20:35:14.906474907+01:00 stderr F debug 2020-11-11T19:35:14.901+0000 7f522a285700 -1 cephadm.serve:
2020-11-11T20:35:14.964212236+01:00 stderr P debug
2020-11-11T20:35:14.964329890+01:00 stderr F 2020-11-11T19:35:14.901+0000 7f522a285700 -1 Traceback (most recent call last):
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/usr/share/ceph/mgr/cephadm/module.py", line 441, in serve
2020-11-11T20:35:14.964329890+01:00 stderr F     serve.serve()
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/usr/share/ceph/mgr/cephadm/serve.py", line 59, in serve
2020-11-11T20:35:14.964329890+01:00 stderr F     self._refresh_hosts_and_daemons()
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/usr/share/ceph/mgr/cephadm/serve.py", line 148, in _refresh_hosts_and_daemons
2020-11-11T20:35:14.964329890+01:00 stderr F     refresh(self.mgr.cache.get_hosts())
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/usr/share/ceph/mgr/cephadm/utils.py", line 68, in forall_hosts_wrapper
2020-11-11T20:35:14.964329890+01:00 stderr F     return CephadmOrchestrator.instance._worker_pool.map(do_work, vals)
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/lib64/python3.6/multiprocessing/pool.py", line 266, in map
2020-11-11T20:35:14.964329890+01:00 stderr F     return self._map_async(func, iterable, mapstar, chunksize).get()
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/lib64/python3.6/multiprocessing/pool.py", line 644, in get
2020-11-11T20:35:14.964329890+01:00 stderr F     raise self._value
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
2020-11-11T20:35:14.964329890+01:00 stderr F     result = (True, func(*args, **kwds))
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
2020-11-11T20:35:14.964329890+01:00 stderr F     return list(map(*args))
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/usr/share/ceph/mgr/cephadm/utils.py", line 62, in do_work
2020-11-11T20:35:14.964329890+01:00 stderr F     return f(*arg)
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/usr/share/ceph/mgr/cephadm/serve.py", line 138, in refresh
2020-11-11T20:35:14.964329890+01:00 stderr F     r = self._refresh_host_osdspec_previews(host)
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/usr/share/ceph/mgr/cephadm/serve.py", line 283, in _refresh_host_osdspec_previews
2020-11-11T20:35:14.964329890+01:00 stderr F     self.mgr.cache.save_host(host)
2020-11-11T20:35:14.964329890+01:00 stderr F   File "/usr/share/ceph/mgr/cephadm/inventory.py", line 306, in save_host
2020-11-11T20:35:14.964329890+01:00 stderr F     for name, dd in self.daemons[host].items():
2020-11-11T20:35:14.964329890+01:00 stderr F KeyError: 'node2'

#2 Updated by Michael Fritch over 3 years ago

Which eventually breaks the cli in a bad way:

$ ceph cephadm get-ssh-config > ssh_config
Error EIO: Module 'cephadm' has experienced an error and cannot handle commands: 'host1'

#3 Updated by Michael Fritch over 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 38064

#4 Updated by Sebastian Wagner about 3 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF