Bug #52372: Managers hang after API calls - Dashboard - Ceph

Actions

Copy link

Bug #52372

open

Managers hang after API calls

Added by Kevin Meijer over 2 years ago. Updated about 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

General - Back-end

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v16.2.4, Ceph - v16.2.5

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi there,

We're using Nagios NRPE to monitor the capacity of our clusters with a self-built script, since deploying this script I've noticed that managers are becoming unresponsive.
You'll find the script (PHP) attached to this issue, but the quick summary of what it does is as follows:

Authorize using /api/auth if no session is cached which is still valid, if newly created, cache the session to disk
Fetch the /api/osd endpoint
Fetch the /api/cluster_conf endpoint

Nagios is calling this script roughly every 5 minutes on each monitor VM (Which there are 3 of).
I've tried to debug the issue, but the managers just stop logging, even though the process is still active.
Restarting the managers via systemd works, but after 1 or 2 days all managers are faulted again.

I've noticed this issue in the 16.2.4 and 16.2.5 release, if any more debug information is required I'll do my best to provide it.

Files

capacity_check.php (7.06 KB) capacity_check.php

Kevin Meijer, 08/23/2021 11:44 AM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr » Dashboard

Custom queries

Bug #52372

Managers hang after API calls

Updated by Sebastian Wagner over 2 years ago

Updated by Ernesto Puerta about 2 years ago

Updated by Kevin Meijer about 2 years ago