Project

General

Profile

Actions

Bug #45690

open

pg_interval_t::check_new_interval is overly generous about guessing when EC PGs could have gone active

Added by ming guo almost 4 years ago. Updated about 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

One EC PG stuck at peering+down forever, the problem occurs through the following steps:
Suppose the pg's acting set is [1,2,3,4,5,6],(k=4,m=2,min_size=4).
1.down osd.6,successfully completed peering.
2.down osd.5,successfully completed peering.
3.down osd.4,pg down.
4.up osd.6, need to wait for osd.4, pg down,no problem.
5.down osd.6,pg down
6.up osd.4,the problem occurred,need to wait for osd.6,but in interval of step 4,the pg state is down,it is unreasonable to wait for osd.6.

Actions

Also available in: Atom PDF