Project

General

Profile

Actions

Bug #63581

open

mgr/zabbix: Incorrect OSD IDs in Zabbix discovery for non-default CRUSH hierarchies

Added by Yannick Lemke 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zabbix module
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The current implementation of the Zabbix discovery only provides the correct OSD IDs if the CRUSH hierarchy is the default root -> host -> osd because OSDs are assumed to be two layers underneath the root bucket. However, if one adds further layers into the CRUSH hierarchy (e.g., root -> rack -> host -> osd), this is no longer true, and incorrect IDs are returned.

Example: For the following hierarchy,

[root@test-sh1 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1         8.18729  root default
-2         2.72910      host test-sh1
 1    hdd  0.90970          osd.1          up   1.00000  1.00000
 2    hdd  0.90970          osd.2          up   1.00000  1.00000
 4    hdd  0.90970          osd.4          up   1.00000  1.00000
-3         5.45819      host test-sh2
 0    hdd  1.81940          osd.0          up   1.00000  1.00000
 3    hdd  1.81940          osd.3          up   1.00000  1.00000
 5    hdd  1.81940          osd.5          up   1.00000  1.00000

the discovery sends the expected result,
{
    "data": [
        {
            "{#OSD}": 1,
            "{#CRUSH_RULE}": "default" 
        },
        {
            "{#OSD}": 2,
            "{#CRUSH_RULE}": "default" 
        },
        {
            "{#OSD}": 4,
            "{#CRUSH_RULE}": "default" 
        },
        {
            "{#OSD}": 0,
            "{#CRUSH_RULE}": "default" 
        },
        {
            "{#OSD}": 3,
            "{#CRUSH_RULE}": "default" 
        },
        {
            "{#OSD}": 5,
            "{#CRUSH_RULE}": "default" 
        }
    ]
}

However, if I add in a rack to the hierarchy,

[root@test-sh1 ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME              STATUS  REWEIGHT  PRI-AFF
-1         8.18729  root default
-3         8.18729      rack rack1
-2         2.72910          host test-sh1
 1    hdd  0.90970              osd.1          up   1.00000  1.00000
 2    hdd  0.90970              osd.2          up   1.00000  1.00000
 4    hdd  0.90970              osd.4          up   1.00000  1.00000
-4         5.45819          host test-sh2
 0    hdd  1.81940              osd.0          up   1.00000  1.00000
 3    hdd  1.81940              osd.3          up   1.00000  1.00000
 5    hdd  1.81940              osd.5          up   1.00000  1.00000

the IDs of the hosts are returned, i.e., the following data is sent to the Zabbix server:
{
    "data": [
        {
            "{#OSD}": -2,
            "{#CRUSH_RULE}": "default" 
        },
        {
            "{#OSD}": -4,
            "{#CRUSH_RULE}": "default" 
        }
    ]
}

This means that subsequent sending of OSD statistics (where the correct IDs are used) fails because the OSDs were never registered on the Zabbix server.

I have implemented and tested a fix which searches the buckets recursively, for which I will open a PR shortly.

P.S.: I'm not sure if this belongs here, but is there a reason the PR to fix #56671 was never merged? I know it's possible to set the module options via ceph config set mgr mgr/zabbix/<option>, but it would be nice if the CLI tool worked as intended.

Actions #1

Updated by Yannick Lemke 5 months ago

Actions

Also available in: Atom PDF