Bug #52372
Managers hang after API calls
0%
Description
Hi there,
We're using Nagios NRPE to monitor the capacity of our clusters with a self-built script, since deploying this script I've noticed that managers are becoming unresponsive.You'll find the script (PHP) attached to this issue, but the quick summary of what it does is as follows:
- Authorize using
/api/auth
if no session is cached which is still valid, if newly created, cache the session to disk - Fetch the
/api/osd
endpoint - Fetch the
/api/cluster_conf
endpoint
Nagios is calling this script roughly every 5 minutes on each monitor VM (Which there are 3 of).
I've tried to debug the issue, but the managers just stop logging, even though the process is still active.
Restarting the managers via systemd works, but after 1 or 2 days all managers are faulted again.
I've noticed this issue in the 16.2.4 and 16.2.5 release, if any more debug information is required I'll do my best to provide it.
History
#1 Updated by Sebastian Wagner almost 2 years ago
- Project changed from Ceph to Dashboard
- Category deleted (
common)
#2 Updated by Ernesto Puerta over 1 year ago
- Category set to General - Back-end
Hi Kevin,
Just found this tracker. We've got reports from a few users facing periodic hangs of the Dashboard, but that's not a general issue, so we're trying to find out whether this is connected to some environment issue.
Are you running some kind of security tool (Qualys) that might interfere with the Ceph Dashboard? This cheroot (the Cherrypy web server) issue reports a similar behaviour.
#3 Updated by Kevin Meijer over 1 year ago
Ernesto Puerta wrote:
Hi Kevin,
Just found this tracker. We've got reports from a few users facing periodic hangs of the Dashboard, but that's not a general issue, so we're trying to find out whether this is connected to some environment issue.
Are you running some kind of security tool (Qualys) that might interfere with the Ceph Dashboard? This cheroot (the Cherrypy web server) issue reports a similar behaviour.
Ernsteo,
No, we are not running any kind of extra (Security) software on our Ceph nodes.