Project

General

Profile

Actions

Bug #52372

open

Managers hang after API calls

Added by Kevin Meijer over 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
General - Back-end
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi there,

We're using Nagios NRPE to monitor the capacity of our clusters with a self-built script, since deploying this script I've noticed that managers are becoming unresponsive.
You'll find the script (PHP) attached to this issue, but the quick summary of what it does is as follows:
  1. Authorize using /api/auth if no session is cached which is still valid, if newly created, cache the session to disk
  2. Fetch the /api/osd endpoint
  3. Fetch the /api/cluster_conf endpoint

Nagios is calling this script roughly every 5 minutes on each monitor VM (Which there are 3 of).
I've tried to debug the issue, but the managers just stop logging, even though the process is still active.
Restarting the managers via systemd works, but after 1 or 2 days all managers are faulted again.

I've noticed this issue in the 16.2.4 and 16.2.5 release, if any more debug information is required I'll do my best to provide it.


Files

capacity_check.php (7.06 KB) capacity_check.php Kevin Meijer, 08/23/2021 11:44 AM
Actions #1

Updated by Sebastian Wagner over 2 years ago

  • Project changed from Ceph to Dashboard
  • Category deleted (common)
Actions #2

Updated by Ernesto Puerta about 2 years ago

  • Category set to General - Back-end

Hi Kevin,

Just found this tracker. We've got reports from a few users facing periodic hangs of the Dashboard, but that's not a general issue, so we're trying to find out whether this is connected to some environment issue.

Are you running some kind of security tool (Qualys) that might interfere with the Ceph Dashboard? This cheroot (the Cherrypy web server) issue reports a similar behaviour.

Actions #3

Updated by Kevin Meijer about 2 years ago

Ernesto Puerta wrote:

Hi Kevin,

Just found this tracker. We've got reports from a few users facing periodic hangs of the Dashboard, but that's not a general issue, so we're trying to find out whether this is connected to some environment issue.

Are you running some kind of security tool (Qualys) that might interfere with the Ceph Dashboard? This cheroot (the Cherrypy web server) issue reports a similar behaviour.

Ernsteo,

No, we are not running any kind of extra (Security) software on our Ceph nodes.

Actions

Also available in: Atom PDF