Project

General

Profile

Actions

Feature #13923

closed

Set health to ERR when one or more PGs is stuck inactive

Added by Wido den Hollander over 8 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Based on this thread: http://article.gmane.org/gmane.comp.file-systems.ceph.user/25551

I would propose two additional settings:

mon_pg_inactive_max = 300
mon_pg_inactive_num = 1

In this case, if there is 1 or more PGs stuck inactive for more then 300 seconds the health state would go into ERR from WARN.

In RBD environments even one inactive PG can cause almost all I/O to stall since Block Devices hit so many different PGs.

Actions #1

Updated by Abhishek Lekshmanan over 8 years ago

  • Status changed from New to Fix Under Review
Actions #2

Updated by Wido den Hollander about 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF