Project

General

Profile

Actions

Bug #52372

open

Managers hang after API calls

Added by Kevin Meijer over 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
General - Back-end
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi there,

We're using Nagios NRPE to monitor the capacity of our clusters with a self-built script, since deploying this script I've noticed that managers are becoming unresponsive.
You'll find the script (PHP) attached to this issue, but the quick summary of what it does is as follows:
  1. Authorize using /api/auth if no session is cached which is still valid, if newly created, cache the session to disk
  2. Fetch the /api/osd endpoint
  3. Fetch the /api/cluster_conf endpoint

Nagios is calling this script roughly every 5 minutes on each monitor VM (Which there are 3 of).
I've tried to debug the issue, but the managers just stop logging, even though the process is still active.
Restarting the managers via systemd works, but after 1 or 2 days all managers are faulted again.

I've noticed this issue in the 16.2.4 and 16.2.5 release, if any more debug information is required I'll do my best to provide it.


Files

capacity_check.php (7.06 KB) capacity_check.php Kevin Meijer, 08/23/2021 11:44 AM
Actions

Also available in: Atom PDF