Bug #48223
host key might not exist during host refresh
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
ssh communication failures during a cephadm host add/remove can break the serve loop:
2020-11-12T17:26:38.804-0700 7f0bafdd4640 0 log_channel(audit) log [DBG] : from='client.4943 -' entity='client.admin' cmd=[{"prefix": "orch host rm", "hostname": "host1", "target": ["mon-mgr", ""]}]: dispatch 2020-11-12T17:26:39.788-0700 7f0bad58f640 0 [cephadm ERROR cephadm.utils] executing refresh((['host1'],)) failed. Traceback (most recent call last): File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/utils.py", line 62, in do_work return f(*arg) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 119, in refresh r = self._refresh_host_daemons(host) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 246, in _refresh_host_daemons self.mgr.cache.save_host(host) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/inventory.py", line 308, in save_host for d in self.devices[host]: KeyError: 'host1' 2020-11-12T17:26:39.788-0700 7f0bad58f640 -1 log_channel(cephadm) log [ERR] : executing refresh((['host1'],)) failed. Traceback (most recent call last): File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/utils.py", line 62, in do_work return f(*arg) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 119, in refresh r = self._refresh_host_daemons(host) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 246, in _refresh_host_daemons self.mgr.cache.save_host(host) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/inventory.py", line 308, in save_host for d in self.devices[host]: KeyError: 'host1' 2020-11-12T17:26:39.788-0700 7f0ba6a42640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'cephadm' while running on mgr.x: 'host1' 2020-11-12T17:26:39.788-0700 7f0ba6a42640 -1 cephadm.serve: 2020-11-12T17:26:39.788-0700 7f0ba6a42640 -1 Traceback (most recent call last): File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/module.py", line 441, in serve serve.serve() File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 59, in serve self._refresh_hosts_and_daemons() File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 148, in _refresh_hosts_and_daemons refresh(self.mgr.cache.get_hosts()) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/utils.py", line 68, in forall_hosts_wrapper return CephadmOrchestrator.instance._worker_pool.map(do_work, vals) File "/usr/lib64/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib64/python3.8/multiprocessing/pool.py", line 771, in get raise self._value File "/usr/lib64/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib64/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(*args)) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/utils.py", line 62, in do_work return f(*arg) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 119, in refresh r = self._refresh_host_daemons(host) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/serve.py", line 246, in _refresh_host_daemons self.mgr.cache.save_host(host) File "/root/src/github.com/mgfritch/ceph/src/pybind/mgr/cephadm/inventory.py", line 308, in save_host for d in self.devices[host]: KeyError: 'host1'
History
#1 Updated by Michael Fritch over 3 years ago
2020-11-11T20:35:14.906474907+01:00 stderr F debug 2020-11-11T19:35:14.901+0000 7f522a285700 -1 cephadm.serve: 2020-11-11T20:35:14.964212236+01:00 stderr P debug 2020-11-11T20:35:14.964329890+01:00 stderr F 2020-11-11T19:35:14.901+0000 7f522a285700 -1 Traceback (most recent call last): 2020-11-11T20:35:14.964329890+01:00 stderr F File "/usr/share/ceph/mgr/cephadm/module.py", line 441, in serve 2020-11-11T20:35:14.964329890+01:00 stderr F serve.serve() 2020-11-11T20:35:14.964329890+01:00 stderr F File "/usr/share/ceph/mgr/cephadm/serve.py", line 59, in serve 2020-11-11T20:35:14.964329890+01:00 stderr F self._refresh_hosts_and_daemons() 2020-11-11T20:35:14.964329890+01:00 stderr F File "/usr/share/ceph/mgr/cephadm/serve.py", line 148, in _refresh_hosts_and_daemons 2020-11-11T20:35:14.964329890+01:00 stderr F refresh(self.mgr.cache.get_hosts()) 2020-11-11T20:35:14.964329890+01:00 stderr F File "/usr/share/ceph/mgr/cephadm/utils.py", line 68, in forall_hosts_wrapper 2020-11-11T20:35:14.964329890+01:00 stderr F return CephadmOrchestrator.instance._worker_pool.map(do_work, vals) 2020-11-11T20:35:14.964329890+01:00 stderr F File "/lib64/python3.6/multiprocessing/pool.py", line 266, in map 2020-11-11T20:35:14.964329890+01:00 stderr F return self._map_async(func, iterable, mapstar, chunksize).get() 2020-11-11T20:35:14.964329890+01:00 stderr F File "/lib64/python3.6/multiprocessing/pool.py", line 644, in get 2020-11-11T20:35:14.964329890+01:00 stderr F raise self._value 2020-11-11T20:35:14.964329890+01:00 stderr F File "/lib64/python3.6/multiprocessing/pool.py", line 119, in worker 2020-11-11T20:35:14.964329890+01:00 stderr F result = (True, func(*args, **kwds)) 2020-11-11T20:35:14.964329890+01:00 stderr F File "/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar 2020-11-11T20:35:14.964329890+01:00 stderr F return list(map(*args)) 2020-11-11T20:35:14.964329890+01:00 stderr F File "/usr/share/ceph/mgr/cephadm/utils.py", line 62, in do_work 2020-11-11T20:35:14.964329890+01:00 stderr F return f(*arg) 2020-11-11T20:35:14.964329890+01:00 stderr F File "/usr/share/ceph/mgr/cephadm/serve.py", line 138, in refresh 2020-11-11T20:35:14.964329890+01:00 stderr F r = self._refresh_host_osdspec_previews(host) 2020-11-11T20:35:14.964329890+01:00 stderr F File "/usr/share/ceph/mgr/cephadm/serve.py", line 283, in _refresh_host_osdspec_previews 2020-11-11T20:35:14.964329890+01:00 stderr F self.mgr.cache.save_host(host) 2020-11-11T20:35:14.964329890+01:00 stderr F File "/usr/share/ceph/mgr/cephadm/inventory.py", line 306, in save_host 2020-11-11T20:35:14.964329890+01:00 stderr F for name, dd in self.daemons[host].items(): 2020-11-11T20:35:14.964329890+01:00 stderr F KeyError: 'node2'
#2 Updated by Michael Fritch over 3 years ago
Which eventually breaks the cli in a bad way:
$ ceph cephadm get-ssh-config > ssh_config Error EIO: Module 'cephadm' has experienced an error and cannot handle commands: 'host1'
#3 Updated by Michael Fritch over 3 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 38064
#4 Updated by Sebastian Wagner about 3 years ago
- Status changed from Fix Under Review to Resolved