Project

General

Profile

Bug #45605

prometheus module hangs recovery and command execution

Added by Tomek Jaroszyk almost 4 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
prometheus module
Target version:
-
% Done:

0%

Source:
Tags:
monitoring
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On our cluster with 2000+ osds we observe high cpu usage by ceph-mgr, similar to https://tracker.ceph.com/issues/44495.
However additionally after less than 30min since ceph-mgr had been started, it start to noticeably slow command execution, like `ceph osd df` or even interrupt recovery process.
We confirmed cause by disabling prometheus plugin for couple of hours and despite high cpu usage there was no affect on command execution as long the module were disabled.

We already tried to increase prometheus scrape interval but it only mitigate issue for little longer.


Related issues

Related to mgr - Bug #44495: prometheus module causes 100% mgr load Closed

History

#1 Updated by Neha Ojha almost 4 years ago

  • Related to Bug #44495: prometheus module causes 100% mgr load added

#2 Updated by Nathan Cutler over 3 years ago

  • Status changed from New to Rejected

luminous is EOL.

As far as I know, active stable versions of Ceph no longer have this problem.

Also available in: Atom PDF