Bug #22226
ceph zabbix plugin sends incorrect motinoring info to zabbix server
0%
Description
zabbix plugin sends incorrect cluster status to zabbix server.
ceph -s shows cluster in WARNING state:
root@ocata-ceph-node4:~# ceph -s
cluster:
id: c11fa860-98ee-4dd2-9243-1788f7ee2364
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
Degraded data redundancy: 830/2490 objects degraded (33.333%), 44 pgs unclean, 44 pgs degraded, 44 pgs undersized
too few PGs per OSD (22 < min 30)
services:
mon: 3 daemons, quorum ocata-ceph-node2,ocata-ceph-node1,ocata-ceph-node3
mgr: ocata-ceph-node1(active), standbys: ocata-ceph-node3, ocata-ceph-node2
mds: cephfs-1/1/1 up {0=ocata-ceph-node1=up:active}, 2 up:standby
osd: 4 osds: 3 up, 4 in
rgw: 1 daemon active
data:
pools: 7 pools, 44 pgs
objects: 830 objects, 978 MB
usage: 13036 MB used, 69503 MB / 82539 MB avail
pgs: 830/2490 objects degraded (33.333%)
44 active+undersized+degraded
from tcpdump between zabbix client and server: {"host":"ocata-ceph-node1","key":"ceph.overall_status","value":"HEALTH_OK"}
issue disappears when active mgr is restarted.
running ceph 12.2.1-1~bpo90+1 on debian stretch
traces attached.
zabbix plugin configuration:
root@ocata-ceph-node1:~# ceph zabbix config-show
{"zabbix_port": 10051, "zabbix_host": "192.168.10.9", "identifier": "ocata-ceph-node1", "zabbix_sender": "/usr/bin/zabbix_sender", "interval": 60}
History
#1 Updated by John Spray over 6 years ago
- Project changed from Ceph to mgr
- Category set to zabbix module
#2 Updated by John Spray over 6 years ago
If you enable the dashboard module, is it showing the same bad state as the zabbix plugin was sending?
#3 Updated by Peter Hardon over 6 years ago
with dashboard module enabled it sends correct state.
#4 Updated by Hans van den Bogert about 6 years ago
I also see this. Only sporadically do I see the HEALTH_WARN in dashboard. AFAICS this is not isolated to the zabbix and/or dashboard - my own plugin also does not get proper health info
#5 Updated by Hans van den Bogert about 6 years ago
I can reproduce this by bringing down a monitor, afterwards the health status does not get updated in the manager until the manager is restarted.
#6 Updated by John Spray about 6 years ago
If it's correlated with a mon going down then I suspect this is the same underlying cause as http://tracker.ceph.com/issues/22142
That fix will be in 12.2.3 so let's see if this issue goes away after that is released.
#7 Updated by Wido den Hollander about 6 years ago
- Status changed from New to Rejected
I this one still active? Otherwise we can close it I think.
Setting it to Rejected for now as I think it is resolved. If not, please re-open this one! :)