Project

General

Profile

Actions

Bug #50950

closed

MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causing PG stuck at peering

Added by Bin Guo almost 3 years ago. Updated almost 3 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm using this mimic cluster (about 530 OSDs) for over 1 year, recently I found some particular OSDs randomly run into busy loop mode, with very cpu usage(300%~400% which hornor the Pod resource limitation). Meanwhile, these OSDs stop responding to any messages from outside and the cluster status shows some PGs stuck at peering state.

All the problems mentioned above could disappear after about 3 to 4 hours, and them everything back to normal. I can't reproduce this, but it's been happened for 3 times.

Any help will be appreciated!

Actions

Also available in: Atom PDF