Actions
Bug #50950
closedMIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causing PG stuck at peering
Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I'm using this mimic cluster (about 530 OSDs) for over 1 year, recently I found some particular OSDs randomly run into busy loop mode, with very cpu usage(300%~400% which hornor the Pod resource limitation). Meanwhile, these OSDs stop responding to any messages from outside and the cluster status shows some PGs stuck at peering state.
All the problems mentioned above could disappear after about 3 to 4 hours, and them everything back to normal. I can't reproduce this, but it's been happened for 3 times.
Any help will be appreciated!
Actions