Bug #45690: pg_interval_t::check_new_interval is overly generous about guessing when EC PGs could have gone active - RADOS - Ceph

Actions

Copy link

Bug #45690

open

pg_interval_t::check_new_interval is overly generous about guessing when EC PGs could have gone active

Added by ming guo almost 4 years ago. Updated about 3 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v12.2.12

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

One EC PG stuck at peering+down forever, the problem occurs through the following steps:
Suppose the pg's acting set is [1,2,3,4,5,6],(k=4,m=2,min_size=4).
1.down osd.6,successfully completed peering.
2.down osd.5,successfully completed peering.
3.down osd.4,pg down.
4.up osd.6, need to wait for osd.4, pg down,no problem.
5.down osd.6,pg down
6.up osd.4,the problem occurred,need to wait for osd.6,but in interval of step 4,the pg state is down,it is unreasonable to wait for osd.6.

History
Notes
Property changes

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #45690

pg_interval_t::check_new_interval is overly generous about guessing when EC PGs could have gone active

Updated by Greg Farnum about 3 years ago