Project

General

Profile

Bug #58066

config key for cephadm hosts doesn't have values for all network interfaces present in the host.

Added by Prajwal Kabbinale over 1 year ago. Updated about 1 year ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
v16.2.10
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

with cephadm, it stores key-value for each host that are part of the cluster.

For ex:
ceph config-key ls | grep mgr/cephadm/host
"mgr/cephadm/host.test-01",
"mgr/cephadm/host.test-02",
"mgr/cephadm/host.test-03",
"mgr/cephadm/host.test-04",
"mgr/cephadm/host.test-05",

These keys stores values of all daemons running on that host along with networks_and_interfaces. For some reason it doesn't store subnet details of other interfaces apart from docker bridge and mon subnet.

ceph config-key get mgr/cephadm/host.test-01| jq .networks_and_interfaces {
"X.X.X.X/X": {
"eno5": [
"X.X.X.X"
]
},
"X.X.X.X/X": {
"docker0": [
"X.X.X.X"
]
},
"****::/**": {
"eno6": [
"XXXX::XXXX:XXXX:XXXX:XXXX"
],
"eno5": [
"XXXX::XXXX:XXXX:XXXX:XXXX"
],
"eno6.vlanX": [
"XXXX::XXXX:XXXX:XXXX:XXXX"
],
"eno5.vlanY": [
"XXXX::XXXX:XXXX:XXXX:XXXX"
]
}
}

We have different IPs configured on eno5, eno5.vlanY, eno6.X, but value of these are not discovered and stored in the following key mgr/cephadm/host.test-XX

Due to this, when we try to deploy rgws with spec file defining different network (that could be part of eno5.vlanY or eno6.vlanX) it fails to find IP address to bind.

Reference cephadm log -
cephadm [DBG] Skipping test-01 with no IP in network(s) ['X.X.X.X/X']

This is because the scheduler script used by the cephadm is trying to get the subnet values for iface and iface_ips from the following code - https://github.com/ceph/ceph/blob/pacific/src/pybind/mgr/cephadm/schedule.py#L362

History

#1 Updated by Prajwal Kabbinale over 1 year ago

Affected version: v16.2.10

#2 Updated by Prajwal Kabbinale over 1 year ago

same problem is observed in v17.2.5 as well

#4 Updated by Casey Bodley over 1 year ago

  • Project changed from rgw to Orchestrator
  • Status changed from New to Fix Under Review
  • Pull request ID set to 49043

#5 Updated by Ilya Dryomov about 1 year ago

  • Target version deleted (v16.2.11)

Also available in: Atom PDF