Project

General

Profile

Actions

Bug #39099

closed

Give recovery for inactive PGs a higher priority

Added by David Zafman about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Backfill inactive gets priority 220 and we should make sure that if we can have inactive that needs recovery only it should be 180 priority but should also be base 220.


Related issues 7 (1 open6 closed)

Related to RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyIn ProgressSage Weil

Actions
Related to RADOS - Documentation #39011: Document how get_recovery_priority() and get_backfill_priority() impacts recovery orderResolvedDavid Zafman03/28/2019

Actions
Has duplicate RADOS - Bug #38930: ceph osd safe-to-destroy wrongly approves any out osdDuplicateDavid Zafman03/25/2019

Actions
Precedes RADOS - Bug #35808: ceph osd ok-to-stop result dosen't match the real situationRejectedDavid Zafman

Actions
Copied to RADOS - Backport #39504: nautilus: Give recovery for inactive PGs a higher priorityResolvedNathan CutlerActions
Copied to RADOS - Backport #39505: luminous: Give recovery for inactive PGs a higher priorityRejectedNeha OjhaActions
Copied to RADOS - Backport #39506: mimic: Give recovery for inactive PGs a higher priorityRejectedNeha OjhaActions
Actions #1

Updated by David Zafman about 5 years ago

  • Subject changed from Check if we should add an inactive check to get_recovery_priority() to Give recovery for inactive PGs a higher priority
  • Pull request ID set to 27503
Actions #2

Updated by David Zafman about 5 years ago

  • Status changed from New to In Progress
Actions #3

Updated by David Zafman about 5 years ago

Checking acting.size() < pool.info.min_size is wrong. During recovery acting == up. So if active.size() < pool.info.min_size, the result of the recovery will still be too low to allow I/O. This would be a very uncommon case plus it doesn't help us to prioritize doing that recovery. Also, recovery allows simultaneous operations because even if data hasn't transferred, those operations can be logged on all OSDs. I'm not sure how this interacts with pg log size limiting. If recovery is slow due to large objects and clients send lots of small operations, the count of log entries is going to go up. Will log trimming slow client requests, so the log doesn't grow unbounded?

When recovery is degraded, we don't differentiate whether or not there are at least min_size replicas of every object. In that case we would want a priority boost, but I'm not sure the best way to determine between the 2 cases. Also, as recovery proceeds the degraded will go down, but at the point that example 1 has 100 degraded like example 2 it is really 100 objects on OSD 1 and 50 objects on OSD 0 and OSD 2. So in actual fact 50 objects are still below min_size assuming we push the same object to both OSDs as recovery goes along.

Example 1, size 3 min_size 2 replicated pool
Should get a priority boost because for 100 objects has 200 degraded (below min_size with 100 replicas existing)
PG 1.0 [1, 0, 2] active+clean
OSDs 0 and 2 go down and out
PG 1.0 [1, 3, 4]. active+recovering+degraded

Example 2, size 3 min_size 2 replicated pool
No priority boost because 100 objects has 100 degraded (min_size 2 ok with 200 existing replicas)
PG 1.0 [1, 0, 2] active+clean
OSDs 2 go down and out
PG 1.0 [1, 0, 3]. active+recovering+degraded

Actions #4

Updated by David Zafman about 5 years ago

  • Related to Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd added
Actions #5

Updated by David Zafman about 5 years ago

  • Related to Feature #39339: prioritize backfill of metadata pools, automatically added
Actions #6

Updated by David Zafman almost 5 years ago

  • Status changed from In Progress to Pending Backport
  • Backport set to luminous, mimic, nautilus
Actions #7

Updated by David Zafman almost 5 years ago

  • Precedes Bug #35808: ceph osd ok-to-stop result dosen't match the real situation added
Actions #8

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #39504: nautilus: Give recovery for inactive PGs a higher priority added
Actions #9

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #39505: luminous: Give recovery for inactive PGs a higher priority added
Actions #10

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #39506: mimic: Give recovery for inactive PGs a higher priority added
Actions #11

Updated by Neha Ojha almost 5 years ago

David, we should discuss whether we want to backport this all the way to luminous or just to nautilus.

Actions #12

Updated by David Zafman almost 5 years ago

  • Related to Documentation #39011: Document how get_recovery_priority() and get_backfill_priority() impacts recovery order added
Actions #13

Updated by Nathan Cutler almost 5 years ago

  • Has duplicate Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd added
Actions #14

Updated by Nathan Cutler almost 5 years ago

  • Related to deleted (Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd)
Actions #15

Updated by David Zafman over 4 years ago

  • Status changed from Pending Backport to Resolved
  • Backport changed from luminous, mimic, nautilus to nautilus
Actions

Also available in: Atom PDF