Actions
Bug #52372
openManagers hang after API calls
Status:
New
Priority:
Normal
Assignee:
-
Category:
General - Back-end
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi there,
We're using Nagios NRPE to monitor the capacity of our clusters with a self-built script, since deploying this script I've noticed that managers are becoming unresponsive.You'll find the script (PHP) attached to this issue, but the quick summary of what it does is as follows:
- Authorize using
/api/auth
if no session is cached which is still valid, if newly created, cache the session to disk - Fetch the
/api/osd
endpoint - Fetch the
/api/cluster_conf
endpoint
Nagios is calling this script roughly every 5 minutes on each monitor VM (Which there are 3 of).
I've tried to debug the issue, but the managers just stop logging, even though the process is still active.
Restarting the managers via systemd works, but after 1 or 2 days all managers are faulted again.
I've noticed this issue in the 16.2.4 and 16.2.5 release, if any more debug information is required I'll do my best to provide it.
Files
Actions