Feature #13923: Set health to ERR when one or more PGs is stuck inactive - Ceph - Ceph

Actions

Copy link

Feature #13923

closed

Set health to ERR when one or more PGs is stuck inactive

Added by Wido den Hollander over 8 years ago. Updated about 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Monitor

Target version:

% Done:

Source:

other

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

Based on this thread: http://article.gmane.org/gmane.comp.file-systems.ceph.user/25551

I would propose two additional settings:

mon_pg_inactive_max = 300
mon_pg_inactive_num = 1

In this case, if there is 1 or more PGs stuck inactive for more then 300 seconds the health state would go into ERR from WARN.

In RBD environments even one inactive PG can cause almost all I/O to stall since Block Devices hit so many different PGs.

Actions

Copy link

Updated by Abhishek Lekshmanan over 8 years ago

Status changed from New to Fix Under Review

master PR: https://github.com/ceph/ceph/pull/7253

Actions

Copy link

Updated by Wido den Hollander about 7 years ago

Status changed from Fix Under Review to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Feature #13923

Set health to ERR when one or more PGs is stuck inactive

Updated by Abhishek Lekshmanan over 8 years ago

Updated by Wido den Hollander about 7 years ago