Project

General

Profile

Bug #22226

ceph zabbix plugin sends incorrect motinoring info to zabbix server

Added by Peter Hardon over 6 years ago. Updated about 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
zabbix module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

zabbix plugin sends incorrect cluster status to zabbix server.

ceph -s shows cluster in WARNING state:
root@ocata-ceph-node4:~# ceph -s
cluster:
id: c11fa860-98ee-4dd2-9243-1788f7ee2364
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
Degraded data redundancy: 830/2490 objects degraded (33.333%), 44 pgs unclean, 44 pgs degraded, 44 pgs undersized
too few PGs per OSD (22 < min 30)

services:
mon: 3 daemons, quorum ocata-ceph-node2,ocata-ceph-node1,ocata-ceph-node3
mgr: ocata-ceph-node1(active), standbys: ocata-ceph-node3, ocata-ceph-node2
mds: cephfs-1/1/1 up {0=ocata-ceph-node1=up:active}, 2 up:standby
osd: 4 osds: 3 up, 4 in
rgw: 1 daemon active
data:
pools: 7 pools, 44 pgs
objects: 830 objects, 978 MB
usage: 13036 MB used, 69503 MB / 82539 MB avail
pgs: 830/2490 objects degraded (33.333%)
44 active+undersized+degraded

from tcpdump between zabbix client and server: {"host":"ocata-ceph-node1","key":"ceph.overall_status","value":"HEALTH_OK"}

issue disappears when active mgr is restarted.
running ceph 12.2.1-1~bpo90+1 on debian stretch

traces attached.

zabbix plugin configuration:
root@ocata-ceph-node1:~# ceph zabbix config-show {"zabbix_port": 10051, "zabbix_host": "192.168.10.9", "identifier": "ocata-ceph-node1", "zabbix_sender": "/usr/bin/zabbix_sender", "interval": 60}

zabbix_ceph.pcap (3.05 KB) Peter Hardon, 11/22/2017 03:17 PM

History

#1 Updated by John Spray over 6 years ago

  • Project changed from Ceph to mgr
  • Category set to zabbix module

#2 Updated by John Spray over 6 years ago

If you enable the dashboard module, is it showing the same bad state as the zabbix plugin was sending?

#3 Updated by Peter Hardon over 6 years ago

with dashboard module enabled it sends correct state.

#4 Updated by Hans van den Bogert about 6 years ago

I also see this. Only sporadically do I see the HEALTH_WARN in dashboard. AFAICS this is not isolated to the zabbix and/or dashboard - my own plugin also does not get proper health info

#5 Updated by Hans van den Bogert about 6 years ago

I can reproduce this by bringing down a monitor, afterwards the health status does not get updated in the manager until the manager is restarted.

#6 Updated by John Spray about 6 years ago

If it's correlated with a mon going down then I suspect this is the same underlying cause as http://tracker.ceph.com/issues/22142

That fix will be in 12.2.3 so let's see if this issue goes away after that is released.

#7 Updated by Wido den Hollander about 6 years ago

  • Status changed from New to Rejected

I this one still active? Otherwise we can close it I think.

Setting it to Rejected for now as I think it is resolved. If not, please re-open this one! :)

Also available in: Atom PDF