Project

General

Profile

Actions

Bug #43551

closed

Trying to enable the CEPH Telegraf module errors 'No such file or directory'

Added by Scott Hubbard over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I followed the steps at https://docs.ceph.com/docs/master/mgr/telegraf/, and enabled the telegraf module.
I then enabled the telegraf daemon to open a local listener port, and then ran the second command to set the address, and received the error:

$ sudo ceph telegraf config-set address udp://:8094
Error EIO: Module 'telegraf' has experienced an error and cannot handle commands: (2, 'No such file or directory')

I also noticed that the cluster health degraded:

2020-01-09 22:02:25.317507 mon.ceph-mon0 [ERR] Health check failed: Module 'telegraf' has failed: (2, 'No such file or directory') (MGR_MODULE_ERROR)

No matter what command I did related to telegraf, received the same error:

$ sudo ceph telegraf config-show
Error EIO: Module 'telegraf' has experienced an error and cannot handle commands: (2, 'No such file or directory')

The ceph-mgr log, shows that it tries to load the modules, but eventually raises an exception:

2020-01-09 22:02:13.021 7fb3c44e7700 1 mgr load Constructed class from module: rbd_support
2020-01-09 22:02:13.021 7fb3c44e7700 1 mgr load Constructed class from module: restful
2020-01-09 22:02:13.021 7fb3c44e7700 1 mgr load Constructed class from module: status
2020-01-09 22:02:13.021 7fb3c44e7700 1 mgr load Constructed class from module: telegraf
2020-01-09 22:02:13.025 7fb3b94d1700 1 mgr[restful] server not running: no certificate configured
2020-01-09 22:02:13.025 7fb3c44e7700 1 mgr load Constructed class from module: volumes
2020-01-09 22:02:23.077 7fb3b84cf700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'telegraf' while running on mgr.ceph-mon0: (2, 'No such file or directory')
2020-01-09 22:02:23.077 7fb3b84cf700 -1 telegraf.serve:
2020-01-09 22:02:23.077 7fb3b84cf700 -1 Traceback (most recent call last):
File "/usr/share/ceph/mgr/telegraf/module.py", line 295, in serve
self.send_to_telegraf()
File "/usr/share/ceph/mgr/telegraf/module.py", line 243, in send_to_telegraf
with sock as s:
File "/usr/share/ceph/mgr/telegraf/basesocket.py", line 41, in __enter__
self.connect()
File "/usr/share/ceph/mgr/telegraf/basesocket.py", line 29, in connect
return self.sock.connect(self.address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: (2, 'No such file or directory')

I tried to create the telegraf unix socket at the default location of /tmp/telegraf.sock and it still had the same error.

What I ended up having to do, was very quickly enable the module, set the config address and then it worked.

sudo ceph mgr module enable telegraf
sudo ceph telegraf config-set address udp://:8094

There appear to be two problems:
1: Even if the sock file exists, the telegraf module is not able to see it or read it.
2: If the module was not able to find the unix socket file, it should not break other command that try to set the options in the module.

My CEPH version:
ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)

3 mgr/mon nodes, and 5 osd nodes.


Related issues 2 (0 open2 closed)

Copied to mgr - Backport #45069: octopus: Trying to enable the CEPH Telegraf module errors 'No such file or directory'ResolvedShyukri ShyukrievActions
Copied to mgr - Backport #45070: nautilus: Trying to enable the CEPH Telegraf module errors 'No such file or directory'ResolvedShyukri ShyukrievActions
Actions

Also available in: Atom PDF