Project

General

Profile

Actions

Bug #58537

closed

cephadm: impossible to remove offline host if ingress daemons were present when it went offline

Added by Adam King over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since ingress daemons, keepalived and haproxy, are special in that their service type does not match their daemon type, we need to make sure we convert daemon type to service type whenever we do service type actions on arbitrary daemons. One example of this is the pre and post remove done for all daemons when trying to remove an offline host. However, we weren't correctly doing this conversion, so attempting to remove the host in this case would fail

[ceph: root@vm-00 /]# ceph orch ps
NAME                             HOST   PORTS        STATUS           REFRESHED  AGE  MEM USE  MEM LIM  VERSION                IMAGE ID      CONTAINER ID  
crash.vm-00                      vm-00               running (12m)       2m ago  12m    7444k        -  18.0.0-1630-g461ac2c3  6e1c5e56d77b  beba4e104f19  
crash.vm-01                      vm-01               running (11m)      40s ago  11m    7528k        -  18.0.0-1630-g461ac2c3  6e1c5e56d77b  22f4b0cca867  
crash.vm-02                      vm-02               host is offline     4m ago  10m    7549k        -  18.0.0-1630-g461ac2c3  6e1c5e56d77b  c0ab5c061778  
haproxy.nfs.foo.vm-02.evtbrf     vm-02  *:2049,9049  host is offline     4m ago   5m    3766k        -  2.3.17-d1c9119         e85424b0d443  b98a4650b91d  
keepalived.nfs.foo.vm-02.dhckjk  vm-02               host is offline     4m ago   5m    2391k        -  2.1.5                  9f7bdb4a87fd  b5ff6771a481  
mgr.vm-00.anuyrl                 vm-00  *:9283,8765  running (14m)       2m ago  14m     483M        -  18.0.0-1630-g461ac2c3  6e1c5e56d77b  26a847d56922  
mgr.vm-01.ihvpug                 vm-01  *:8443,8765  running (11m)      40s ago  11m     430M        -  18.0.0-1630-g461ac2c3  6e1c5e56d77b  c88141f00488  
mon.vm-00                        vm-00               running (14m)       2m ago  14m    56.5M    2048M  18.0.0-1630-g461ac2c3  6e1c5e56d77b  ced3dbaffd06  
mon.vm-01                        vm-01               running (11m)      40s ago  11m    45.2M    2048M  18.0.0-1630-g461ac2c3  6e1c5e56d77b  7cba2506fe01  
mon.vm-02                        vm-02               host is offline     4m ago  10m    32.9M    2048M  18.0.0-1630-g461ac2c3  6e1c5e56d77b  4c8f34376c7b  
nfs.foo.0.0.vm-00.ogmzed         vm-00  *:12049      running (5m)        2m ago   5m    72.5M        -  4.2.2                  6e1c5e56d77b  e55a9bf84ad1  
nfs.foo.1.0.vm-02.fyoxoa         vm-02  *:12049      host is offline     4m ago   5m    72.4M        -  4.2.2                  6e1c5e56d77b  c9d7644ec202  
osd.0                            vm-00               running (11m)       2m ago  11m    84.9M    11.1G  18.0.0-1630-g461ac2c3  6e1c5e56d77b  39c71c24be9d  
osd.1                            vm-01               running (11m)      40s ago  11m    78.8M    11.1G  18.0.0-1630-g461ac2c3  6e1c5e56d77b  08d4ee4c503a  
osd.2                            vm-00               running (10m)       2m ago  10m    75.0M    11.1G  18.0.0-1630-g461ac2c3  6e1c5e56d77b  d735d590b7c7  
osd.3                            vm-01               running (10m)      40s ago  10m    81.2M    11.1G  18.0.0-1630-g461ac2c3  6e1c5e56d77b  0dad63d66ab5  
osd.4                            vm-02               host is offline     4m ago   9m    72.7M    13.1G  18.0.0-1630-g461ac2c3  6e1c5e56d77b  024dce173ec2  
osd.5                            vm-02               host is offline     4m ago   9m    75.5M    13.1G  18.0.0-1630-g461ac2c3  6e1c5e56d77b  cb7e945e32c5  
[ceph: root@vm-00 /]# ceph orch host rm vm-02
Error EINVAL: vm-02 is offline, please use --offline and --force to remove this host. This can potentially cause data loss
[ceph: root@vm-00 /]# ceph orch host rm vm-02 --offline --force
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1761, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 459, in _remove_host
    raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in raise_if_exception
    raise e
KeyError: 'haproxy'

Related issues 2 (0 open2 closed)

Copied to Orchestrator - Backport #58556: pacific: cephadm: impossible to remove offline host if ingress daemons were present when it went offlineResolvedAdam KingActions
Copied to Orchestrator - Backport #58557: quincy: cephadm: impossible to remove offline host if ingress daemons were present when it went offlineResolvedAdam KingActions
Actions #1

Updated by Adam King over 1 year ago

  • Pull request ID set to 49827
Actions #2

Updated by Adam King over 1 year ago

  • Status changed from In Progress to Pending Backport
Actions #3

Updated by Backport Bot over 1 year ago

  • Copied to Backport #58556: pacific: cephadm: impossible to remove offline host if ingress daemons were present when it went offline added
Actions #4

Updated by Backport Bot over 1 year ago

  • Copied to Backport #58557: quincy: cephadm: impossible to remove offline host if ingress daemons were present when it went offline added
Actions #5

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #6

Updated by Adam King about 1 year ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF