Actions
Bug #45690
openpg_interval_t::check_new_interval is overly generous about guessing when EC PGs could have gone active
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
One EC PG stuck at peering+down forever, the problem occurs through the following steps:
Suppose the pg's acting set is [1,2,3,4,5,6],(k=4,m=2,min_size=4).
1.down osd.6,successfully completed peering.
2.down osd.5,successfully completed peering.
3.down osd.4,pg down.
4.up osd.6, need to wait for osd.4, pg down,no problem.
5.down osd.6,pg down
6.up osd.4,the problem occurred,need to wait for osd.6,but in interval of step 4,the pg state is down,it is unreasonable to wait for osd.6.
Actions