Bug #63581
openmgr/zabbix: Incorrect OSD IDs in Zabbix discovery for non-default CRUSH hierarchies
0%
Description
The current implementation of the Zabbix discovery only provides the correct OSD IDs if the CRUSH hierarchy is the default root -> host -> osd
because OSDs are assumed to be two layers underneath the root bucket. However, if one adds further layers into the CRUSH hierarchy (e.g., root -> rack -> host -> osd
), this is no longer true, and incorrect IDs are returned.
Example: For the following hierarchy,
[root@test-sh1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 8.18729 root default
-2 2.72910 host test-sh1
1 hdd 0.90970 osd.1 up 1.00000 1.00000
2 hdd 0.90970 osd.2 up 1.00000 1.00000
4 hdd 0.90970 osd.4 up 1.00000 1.00000
-3 5.45819 host test-sh2
0 hdd 1.81940 osd.0 up 1.00000 1.00000
3 hdd 1.81940 osd.3 up 1.00000 1.00000
5 hdd 1.81940 osd.5 up 1.00000 1.00000
the discovery sends the expected result,
{
"data": [
{
"{#OSD}": 1,
"{#CRUSH_RULE}": "default"
},
{
"{#OSD}": 2,
"{#CRUSH_RULE}": "default"
},
{
"{#OSD}": 4,
"{#CRUSH_RULE}": "default"
},
{
"{#OSD}": 0,
"{#CRUSH_RULE}": "default"
},
{
"{#OSD}": 3,
"{#CRUSH_RULE}": "default"
},
{
"{#OSD}": 5,
"{#CRUSH_RULE}": "default"
}
]
}
However, if I add in a rack
to the hierarchy,
[root@test-sh1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 8.18729 root default
-3 8.18729 rack rack1
-2 2.72910 host test-sh1
1 hdd 0.90970 osd.1 up 1.00000 1.00000
2 hdd 0.90970 osd.2 up 1.00000 1.00000
4 hdd 0.90970 osd.4 up 1.00000 1.00000
-4 5.45819 host test-sh2
0 hdd 1.81940 osd.0 up 1.00000 1.00000
3 hdd 1.81940 osd.3 up 1.00000 1.00000
5 hdd 1.81940 osd.5 up 1.00000 1.00000
the IDs of the hosts are returned, i.e., the following data is sent to the Zabbix server:
{
"data": [
{
"{#OSD}": -2,
"{#CRUSH_RULE}": "default"
},
{
"{#OSD}": -4,
"{#CRUSH_RULE}": "default"
}
]
}
This means that subsequent sending of OSD statistics (where the correct IDs are used) fails because the OSDs were never registered on the Zabbix server.
I have implemented and tested a fix which searches the buckets recursively, for which I will open a PR shortly.
P.S.: I'm not sure if this belongs here, but is there a reason the PR to fix #56671 was never merged? I know it's possible to set the module options via ceph config set mgr mgr/zabbix/<option>
, but it would be nice if the CLI tool worked as intended.
Updated by Yannick Lemke 6 months ago
Created Pull Request: https://github.com/ceph/ceph/pull/54562