Project

General

Profile

Bug #51165

mgr/telegraf: telegraf plugin not starting and causing mgr process to crash

Added by Bastian Mäuser 7 months ago. Updated 24 days ago.

Status:
New
Priority:
Normal
Category:
ceph-mgr
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
octopus, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of problem

Cluster goes HEALTH_ERR after enabling telegraf plugin.

Environment

  • ceph version string: 15.2.13
  • Platform (OS/distro/release): Proxmox 6.4-8
  • Cluster details (nodes, monitors, OSDs): 5 nodes, 5mons, 20 OSD
  • Browser used (e.g.: Version 86.0.4240.198 (Official Build) (64-bit)): FF89

How reproducible

Run Octopus 15.2.13 (In earlier Versions an issue, too), enable telegraf plugin with:

ceph telegraf config-set address udp://:8094
ceph telegraf config-set interval 10
ceph mgr module enable telegraf

Actual results

Cluster goes HEALTH_ERR, message:

[ERR] [ERR] MGR_MODULE_ERROR: Module 'telegraf' has failed: str, bytes or bytearray expected, not NoneType

Expected results

Start broadcasting ceph telemetry via UDP to listening telegrafs

History

#1 Updated by Neha Ojha 7 months ago

  • Assignee set to Wido den Hollander
  • Backport set to octopus, pacific

Hi Wido, could you please help take a look at this.

#2 Updated by Wido den Hollander 5 months ago

This only happens when using UDP broadcasting? It seems that the internal URL parsing inside the Telegraf module breaks

See: https://github.com/ceph/ceph/blob/master/src/pybind/mgr/telegraf/basesocket.py#L27

        self.sock = socket.socket(family=socket_family, type=socket_type)
        if self.sock.family == socket.AF_UNIX:
            self.address: Union[str, Tuple[str, int]] = self.url.path
        else:
            assert self.url.hostname
            assert self.url.port
            self.address = (self.url.hostname, self.url.port)

#3 Updated by Neha Ojha 3 months ago

Wido den Hollander wrote:

This only happens when using UDP broadcasting? It seems that the internal URL parsing inside the Telegraf module breaks

See: https://github.com/ceph/ceph/blob/master/src/pybind/mgr/telegraf/basesocket.py#L27

[...]

We've seen another similar report, which is also using UDP broadcasting.

#4 Updated by Scott Hubbard 24 days ago

Neha Ojha wrote:

We've seen another similar report, which is also using UDP broadcasting.

It works if you pass in a localhost IP address.

ceph telegraf config-set address udp://127.0.0.1:8094

Also available in: Atom PDF