Bug #39099: Give recovery for inactive PGs a higher priority - RADOS - Ceph

Actions

Copy link

Bug #39099

closed

Give recovery for inactive PGs a higher priority

Added by David Zafman about 5 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

David Zafman

Category:

Target version:

% Done:

Source:

Tags:

Backport:

nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

27503

Crash signature (v1):

Crash signature (v2):

Description

Backfill inactive gets priority 220 and we should make sure that if we can have inactive that needs recovery only it should be 180 priority but should also be base 220.

Related issues 7 (1 open — 6 closed)

Related to RADOS - Feature #39339: prioritize backfill of metadata pools, automatically

In Progress

Sage Weil

Actions

Related to RADOS - Documentation #39011: Document how get_recovery_priority() and get_backfill_priority() impacts recovery order

Resolved

David Zafman

03/28/2019

Actions

Has duplicate RADOS - Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd

Duplicate

David Zafman

03/25/2019

Actions

Precedes RADOS - Bug #35808: ceph osd ok-to-stop result dosen't match the real situation

Rejected

David Zafman

Actions

Copied to RADOS - Backport #39504: nautilus: Give recovery for inactive PGs a higher priority

Resolved

Nathan Cutler

Actions

Copied to RADOS - Backport #39505: luminous: Give recovery for inactive PGs a higher priority

Rejected

Neha Ojha

Actions

Copied to RADOS - Backport #39506: mimic: Give recovery for inactive PGs a higher priority

Rejected

Neha Ojha

Actions

Copy link

Updated by David Zafman about 5 years ago

Subject changed from Check if we should add an inactive check to get_recovery_priority() to Give recovery for inactive PGs a higher priority
Pull request ID set to 27503

Actions

Copy link

Updated by David Zafman about 5 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by David Zafman about 5 years ago

Checking acting.size() < pool.info.min_size is wrong. During recovery acting == up. So if active.size() < pool.info.min_size, the result of the recovery will still be too low to allow I/O. This would be a very uncommon case plus it doesn't help us to prioritize doing that recovery. Also, recovery allows simultaneous operations because even if data hasn't transferred, those operations can be logged on all OSDs. I'm not sure how this interacts with pg log size limiting. If recovery is slow due to large objects and clients send lots of small operations, the count of log entries is going to go up. Will log trimming slow client requests, so the log doesn't grow unbounded?

When recovery is degraded, we don't differentiate whether or not there are at least min_size replicas of every object. In that case we would want a priority boost, but I'm not sure the best way to determine between the 2 cases. Also, as recovery proceeds the degraded will go down, but at the point that example 1 has 100 degraded like example 2 it is really 100 objects on OSD 1 and 50 objects on OSD 0 and OSD 2. So in actual fact 50 objects are still below min_size assuming we push the same object to both OSDs as recovery goes along.

Example 1, size 3 min_size 2 replicated pool
Should get a priority boost because for 100 objects has 200 degraded (below min_size with 100 replicas existing)
PG 1.0 [1, 0, 2] active+clean
OSDs 0 and 2 go down and out
PG 1.0 [1, 3, 4]. active+recovering+degraded

Example 2, size 3 min_size 2 replicated pool
No priority boost because 100 objects has 100 degraded (min_size 2 ok with 200 existing replicas)
PG 1.0 [1, 0, 2] active+clean
OSDs 2 go down and out
PG 1.0 [1, 0, 3]. active+recovering+degraded

Actions

Copy link