Bug #47127
openosd_id_claims uses shortlabel instead of the FQDN and cannot be fulfilled.
0%
Description
Hello!
I was testing replacing disks on my 9 node ceph 15.2.4 cluster provisioned by cephadm and using the orchestrator
We give our hosts a FQDN instead of shortlabel in /etc/hostname.
The cluster:
ceph orch host ls HOST ADDR LABELS STATUS mon1.ceph2.example.net mon1.ceph2.example.net mon2.ceph2.example.net mon2.ceph2.example.net mon3.ceph2.example.net mon3.ceph2.example.net node1.ceph2.example.net node1.ceph2.example.net node2.ceph2.example.net node2.ceph2.example.net node3.ceph2.example.net node3.ceph2.example.net node4.ceph2.example.net node4.ceph2.example.net node5.ceph2.example.net node5.ceph2.example.net node6.ceph2.example.net node6.ceph2.example.net
I have told the orchestrator to remove osd.59 with the --replace flag so that the OSD ID gets reserved for this host as such:
ceph orch osd rm 59 --replace
We have 60 OSDs, 0 through 59. We replaced the disk that was osd.59 on node6.ceph2.example.net and the orchestrator gave it osd.60.
59 is still claimed and reserved for 'node6'.
What I think goes wrong is that node6 does not match node6.ceph2.example.net and therefore the orchestrator hands out a new ID.
The spec:
ceph orch ls osd --export block_db_size: null block_wal_size: null data_devices: null data_directories: null db_devices: null db_slots: null encrypted: false journal_devices: null journal_size: null objectstore: bluestore osd_id_claims: {} osds_per_device: null placement: hosts: - hostname: node1.ceph2.example.net name: '' network: '' service_id: '1' service_name: osd.1 service_type: osd unmanaged: true wal_devices: null wal_slots: null --- block_db_size: null block_wal_size: null data_devices: all: false limit: null model: null paths: [] rotational: 0 size: null vendor: null data_directories: null db_devices: null db_slots: null encrypted: false journal_devices: null journal_size: null objectstore: bluestore osd_id_claims: node6: - '59' osds_per_device: null placement: host_pattern: node* service_id: nvme_drive_group service_name: osd.nvme_drive_group service_type: osd unmanaged: false wal_devices: null wal_slots: null
The cephadm log:
8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node6.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node5.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node4.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node3.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node2.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node1.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Found osd claims for drivegroup nvme_drive_group -> {'node6': ['59']} 8/25/20 9:54:17 AM [INF] Found osd claims -> {'node6': ['59']} 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node6.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node5.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node4.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node3.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node2.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Applying nvme_drive_group on host node1.ceph2.example.net... 8/25/20 9:54:17 AM [INF] Found osd claims for drivegroup nvme_drive_group -> {'node6': ['59']} 8/25/20 9:54:17 AM [INF] Found osd claims -> {'node6': ['59']}
My ceph health status is now on WARNING:
root @ node1.ceph2 # ceph health detail HEALTH_WARN 1 stray daemons(s) not managed by cephadm [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemons(s) not managed by cephadm stray daemon osd.59 on host node6.ceph2.example.net not managed by cephadm
There is no (stray) daemon for osd.59 running on node6.ceph2.example.net. I assume this error is incorrect and should say that there are unclaimed OSD IDs on node6.ceph2.example.net.
root @ node6.ceph2 # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 398897ed1450 ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 18 hours ago Up 18 hours ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.60 6db03a587263 ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.57 df367319fe04 ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.53 b6a5e5978ba9 ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.50 c3628ca8f50e ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.54 89787a68d57e ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.52 cf2e266c0394 ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.56 c97f093ca2b4 ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.58 e6c8b4b417e3 ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.55 42571862a4a1 ceph/ceph:v15.2.4 "/usr/bin/ceph-osd -…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-osd.51 3a9f2cba488b ceph/ceph:v15.2.4 "/usr/bin/ceph-crash…" 7 days ago Up 7 days ceph-d77f7c4a-d656-11ea-95cb-531234b0f844-crash.node6
As per instructions on https://docs.ceph.com/docs/master/cephadm/concepts/ my hosts are compliant with the second valid way to resolve hostnames
~ root @ node6.ceph2 # hostname node6.ceph2.example.net ~ root @ node6.ceph2 # hostname -s node6
If there is any information you need to debug this let me know and I will be happy to give it to you
For now I am stuck with a health state of warning and a claim on osd.59 that I cannot fulfill. How can I remove a OSD claim so that my cluster goes back to healthy?
I have tried removing the osd.nvme_drive_group spec and re-applying my spec.yml but that did not do the trick.
Updated by Daniël Vos over 3 years ago
I have tracked it down to the
find_destroyed_osds()function in mgr/cephadm/services/osd.py
The data is extracted from
ceph osd treebut the tree doesn't take FQDN's into consideration but only shows shortlabels, like such:
root @ mon1.ceph2 # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 54.57889 root default -15 27.28793 room 1d12 -9 9.09698 host node4 30 ssd 0.90970 osd.30 up 1.00000 1.00000 31 ssd 0.90970 osd.31 up 1.00000 1.00000 32 ssd 0.90970 osd.32 up 1.00000 1.00000
This is because Ceph automatically sets a ceph-osd daemon's location to be root=default host=HOSTNAME (based on the output from hostname -s ). (Source: https://docs.ceph.com/docs/master/rados/operations/crush-map/)
If I am not mistaken the following happens:
The call to
osd_id_claims.get(host, [])here: https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/services/osd.py#L40 will result in an empty list because it looks for the FQDN of my host (node6.ceph2.example.net) in a list that only contains an entry for node6
Ofcourse, someone that has experience with this part of the code would need to confirm my findings (And hopefully think of a fix?)
Updated by Juan Miguel Olmo Martínez about 3 years ago
- Assignee set to Juan Miguel Olmo Martínez
Updated by Sebastian Wagner almost 3 years ago
- Assignee deleted (
Juan Miguel Olmo Martínez) - Priority changed from Normal to High
Updated by Sebastian Wagner over 2 years ago
- Priority changed from High to Normal
Updated by Sebastian Wagner over 2 years ago
- Related to Bug #50776: cephadm: CRUSH uses bare host names added