Bug #51571
closedcephadm: remove iscsi service fails due to incorrect gateway name
0%
Description
If the iscsi service is removed and the dashboard is deployed (dashboard mgr module enabled) then the cluster status goes to ERR and the removal is stuck is deleting state.
Steps to reproduce:
1. bootstrap a cluster with dashboard : cephadm bootstrap --mon-ip x.x.x.x
2. add some OSDs
3. deploy the iscsi service
4. remove iscsi with : ceph orch rm iscsi.iscsi
Results:
# ceph orch ls --service_type iscsi NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.iscsi 0/1 <deleting> 15h cephaio
# ceph health detail HEALTH_ERR Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'cephaio' does not exist retval: -2; 1 mgr modules have recently crashed [ERR] MGR_MODULE_ERROR: Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'cephaio' does not exist retval: -2 Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'cephaio' does not exist retval: -2 [WRN] RECENT_MGR_MODULE_CRASH: 1 mgr modules have recently crashed mgr module cephadm crashed in daemon mgr.cephaio.cfrgve on host cephaio at 2021-07-06T22:39:45.838001Z
# ceph crash info 2021-07-06T22:39:45.838001Z_32d73635-f3e8-42dc-b1dd-d88b2aa2cd02 { "backtrace": [ " File \"/usr/share/ceph/mgr/cephadm/module.py\", line 501, in serve\n serve.serve()", " File \"/usr/share/ceph/mgr/cephadm/serve.py\", line 92, in serve\n self._check_daemons()", " File \"/usr/share/ceph/mgr/cephadm/serve.py\", line 871, in _check_daemons\n self._remove_daemon(dd.name(), dd.hostname)", " File \"/usr/share/ceph/mgr/cephadm/serve.py\", line 1111, in _remove_daemon\n self.mgr.cephadm_services[daemon_type_to_service(daemon_type)].post_remove(daemon)", " File \"/usr/share/ceph/mgr/cephadm/services/iscsi.py\", line 168, in post_remove\n 'name': daemon.hostname,", " File \"/usr/share/ceph/mgr/mgr_module.py\", line 1475, in check_mon_command\n raise MonCommandFailed(f'{cmd_dict[\"prefix\"]} failed: {r.stderr} retval: {r.retval}')", "mgr_module.MonCommandFailed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'cephaio' does not exist retval: -2" ], "ceph_version": "17.0.0-5816-g0b361349", "crash_id": "2021-07-06T22:39:45.838001Z_32d73635-f3e8-42dc-b1dd-d88b2aa2cd02", "entity_name": "mgr.cephaio.cfrgve", "mgr_module": "cephadm", "mgr_module_caller": "PyModuleRunner::serve", "mgr_python_exception": "MonCommandFailed", "os_id": "centos", "os_name": "CentOS Linux", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mgr", "stack_sig": "726b6d6107ae948f6a37f87f1ff5ea4d0211aefaa5ac9ee66fda18fe203fd8ac", "timestamp": "2021-07-06T22:39:45.838001Z", "utsname_hostname": "cephaio", "utsname_machine": "x86_64", "utsname_release": "4.18.0-305.7.1.el8_4.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Tue Jun 29 21:55:12 UTC 2021" }
We try to remove the gateway called `cephaio` which is the hostname of the node running the iSCSI service but that gateway wasn't added with that name so the removal fails.
# ceph dashboard iscsi-gateway-list {"gateways": {"ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh": {"service_url": "http://admin:+xFRe+RES@7vg24n@192.168.100.13:5000"}}}
It looks like the gateway name is based on the pattern : <cluster name>-<cluster fsid>-<daemon name>
Updated by Dimitri Savineau almost 3 years ago
<cluster name>-<cluster fsid>-<daemon name>
That's in fact the container name
# podman exec ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh python3 -c 'import socket; print(socket.getfqdn())' ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh # podman exec ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh python3 -c 'import socket; print(socket.gethostname())' cephaio # podman exec ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.100.13 cephaio.novalocal cephaio 127.0.1.1 cephaio cephaio ceph-9c104f06-dea5-11eb-b311-fa163e8cb5c1-iscsi.iscsi.cephaio.muhijh
Updated by Sebastian Wagner almost 3 years ago
- Is duplicate of Bug #51590: cephadm: iscsi: The first gateway defined must be the local machine added
Updated by Sebastian Wagner almost 3 years ago
This seems to be a duplicate of the other issue. Python's socket.getfqdn() unfortunately picks up the container name as the host's FQDN, if it contains dots.