Project

General

Profile

Actions

Bug #51571

closed

cephadm: remove iscsi service fails due to incorrect gateway name

Added by Dimitri Savineau almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If the iscsi service is removed and the dashboard is deployed (dashboard mgr module enabled) then the cluster status goes to ERR and the removal is stuck is deleting state.

Steps to reproduce:

1. bootstrap a cluster with dashboard : cephadm bootstrap --mon-ip x.x.x.x
2. add some OSDs
3. deploy the iscsi service
4. remove iscsi with : ceph orch rm iscsi.iscsi

Results:

# ceph orch ls --service_type iscsi
NAME         PORTS  RUNNING  REFRESHED   AGE  PLACEMENT  
iscsi.iscsi             0/1  <deleting>  15h  cephaio
# ceph health detail
HEALTH_ERR Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'cephaio' does not exist retval: -2; 1 mgr modules have recently crashed
[ERR] MGR_MODULE_ERROR: Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'cephaio' does not exist retval: -2
    Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'cephaio' does not exist retval: -2
[WRN] RECENT_MGR_MODULE_CRASH: 1 mgr modules have recently crashed
    mgr module cephadm crashed in daemon mgr.cephaio.cfrgve on host cephaio at 2021-07-06T22:39:45.838001Z
# ceph crash info 2021-07-06T22:39:45.838001Z_32d73635-f3e8-42dc-b1dd-d88b2aa2cd02
{
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/cephadm/module.py\", line 501, in serve\n    serve.serve()",
        "  File \"/usr/share/ceph/mgr/cephadm/serve.py\", line 92, in serve\n    self._check_daemons()",
        "  File \"/usr/share/ceph/mgr/cephadm/serve.py\", line 871, in _check_daemons\n    self._remove_daemon(dd.name(), dd.hostname)",
        "  File \"/usr/share/ceph/mgr/cephadm/serve.py\", line 1111, in _remove_daemon\n    self.mgr.cephadm_services[daemon_type_to_service(daemon_type)].post_remove(daemon)",
        "  File \"/usr/share/ceph/mgr/cephadm/services/iscsi.py\", line 168, in post_remove\n    'name': daemon.hostname,",
        "  File \"/usr/share/ceph/mgr/mgr_module.py\", line 1475, in check_mon_command\n    raise MonCommandFailed(f'{cmd_dict[\"prefix\"]} failed: {r.stderr} retval: {r.retval}')",
        "mgr_module.MonCommandFailed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'cephaio' does not exist retval: -2" 
    ],
    "ceph_version": "17.0.0-5816-g0b361349",
    "crash_id": "2021-07-06T22:39:45.838001Z_32d73635-f3e8-42dc-b1dd-d88b2aa2cd02",
    "entity_name": "mgr.cephaio.cfrgve",
    "mgr_module": "cephadm",
    "mgr_module_caller": "PyModuleRunner::serve",
    "mgr_python_exception": "MonCommandFailed",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "726b6d6107ae948f6a37f87f1ff5ea4d0211aefaa5ac9ee66fda18fe203fd8ac",
    "timestamp": "2021-07-06T22:39:45.838001Z",
    "utsname_hostname": "cephaio",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-305.7.1.el8_4.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Tue Jun 29 21:55:12 UTC 2021" 
}

We try to remove the gateway called `cephaio` which is the hostname of the node running the iSCSI service but that gateway wasn't added with that name so the removal fails.

# ceph dashboard iscsi-gateway-list
{"gateways": {"ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh": {"service_url": "http://admin:+xFRe+RES@7vg24n@192.168.100.13:5000"}}}

It looks like the gateway name is based on the pattern : <cluster name>-<cluster fsid>-<daemon name>


Related issues 1 (0 open1 closed)

Is duplicate of Orchestrator - Bug #51590: cephadm: iscsi: The first gateway defined must be the local machineResolvedSebastian Wagner

Actions
Actions #1

Updated by Dimitri Savineau almost 3 years ago

<cluster name>-<cluster fsid>-<daemon name>

That's in fact the container name

# podman exec ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh python3 -c 'import socket; print(socket.getfqdn())'
ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh
# podman exec ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh python3 -c 'import socket; print(socket.gethostname())'
cephaio
# podman exec ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.13    cephaio.novalocal cephaio
127.0.1.1 cephaio cephaio ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh
Actions #2

Updated by Sebastian Wagner almost 3 years ago

  • Is duplicate of Bug #51590: cephadm: iscsi: The first gateway defined must be the local machine added
Actions #3

Updated by Sebastian Wagner almost 3 years ago

This seems to be a duplicate of the other issue. Python's socket.getfqdn() unfortunately picks up the container name as the host's FQDN, if it contains dots.

Actions #4

Updated by Ken Dreyer over 2 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF