Bug #49654: iSCSI stops working after Upgrade 15.2.4 -> 15.2.9 - Orchestrator - Ceph

Actions

Copy link

Bug #49654

closed

iSCSI stops working after Upgrade 15.2.4 -> 15.2.9

Added by Frank Holtz about 3 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

pacific

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

41483

Crash signature (v1):

Crash signature (v2):

Description

We have four iSCSI-Gateways deployed with "ceph orch". two gateways each form an IQN sharing one /23 network.

In Upgrade processs, I have restarted the Gateways s103 and s104 by changing the included hosts in "ceph orch deploy" and then, I have redeployed both gateways. First, I only have redeployed only one iscsi-gateway. This gateway stopped providing any LUN. Restarting the second gateway has the same result.

If I stop the iscsi-service, gwcli reports the stopped node is down.

Mapping a new LUN with gwcli shows the following message:

Failed - ceph-UUID-iscsi.iscsi.s103.ljhzwr cannot be used to perform this operation because it is not defined within the gateways configuration

The rbd-target-api.log shows this relevant message:

2021-03-08 14:36:51,744     INFO [gateway.py:236:define()] - Configuration does not have an entry for this host(ceph-UUID-iscsi.iscsi.s103.ljhzwr) - nothing to define to LIO

Files

Download all files

gateway.conf (19.2 KB) gateway.conf	gateway.conf object	Frank Holtz, 03/08/2021 02:22 PM
ceph-s.txt (1.16 KB) ceph-s.txt	ceph status	Frank Holtz, 03/08/2021 02:33 PM
rbd-target-api.log (5.16 KB) rbd-target-api.log	rbd-target-api.log	Frank Holtz, 03/08/2021 02:41 PM
gwcli-ls.txt (12.5 KB) gwcli-ls.txt	gwcli ls	Frank Holtz, 03/08/2021 02:53 PM
iscsi-gateway.cfg (400 Bytes) iscsi-gateway.cfg	iscsi-gateway.cfg s103	Frank Holtz, 03/08/2021 03:03 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Frank Holtz about 3 years ago

Inside the iscsi-Container the wrong hostname is reported (CentOS 8 + podman):

# python3
Python 3.6.8 (default, Aug 24 2020, 17:57:11)
[GCC 8.3.1 20191121 (Red Hat 8.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.getfqdn()
'ceph-UUID-iscsi.iscsi.s101.kkbxij'

The reason is in /etc/hosts inside the container:
cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
#...other entries including own ip (not iscsi-ip)....
127.0.1.1 s101 s101 ceph-6ba5bfc8-009c-11eb-9f56-0025b5ff7900-iscsi.iscsi.s101.kkbxij

To fix this, the FQDN has set as hostname. After rebooting the node, everythin is fine.

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Project changed from Ceph to Orchestrator
Category deleted (~~common~~)

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Priority changed from Normal to High

Actions

Copy link

Updated by Sebastian Wagner almost 3 years ago

Related to Bug #50306: /etc/hosts is not passed to ceph containers. clusters that were relying on /etc/hosts for name resolution will have strange behavior added

Actions

Copy link