Bug #45605
prometheus module hangs recovery and command execution
Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
prometheus module
Target version:
-
% Done:
0%
Source:
Tags:
monitoring
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
On our cluster with 2000+ osds we observe high cpu usage by ceph-mgr, similar to https://tracker.ceph.com/issues/44495.
However additionally after less than 30min since ceph-mgr had been started, it start to noticeably slow command execution, like `ceph osd df` or even interrupt recovery process.
We confirmed cause by disabling prometheus plugin for couple of hours and despite high cpu usage there was no affect on command execution as long the module were disabled.
We already tried to increase prometheus scrape interval but it only mitigate issue for little longer.
Related issues
History
#1 Updated by Neha Ojha over 3 years ago
- Related to Bug #44495: prometheus module causes 100% mgr load added
#2 Updated by Nathan Cutler about 3 years ago
- Status changed from New to Rejected
luminous is EOL.
As far as I know, active stable versions of Ceph no longer have this problem.