Bug #39099
closedGive recovery for inactive PGs a higher priority
0%
Description
Backfill inactive gets priority 220 and we should make sure that if we can have inactive that needs recovery only it should be 180 priority but should also be base 220.
Updated by David Zafman about 5 years ago
- Subject changed from Check if we should add an inactive check to get_recovery_priority() to Give recovery for inactive PGs a higher priority
- Pull request ID set to 27503
Updated by David Zafman about 5 years ago
- Status changed from New to In Progress
Updated by David Zafman about 5 years ago
Checking acting.size() < pool.info.min_size is wrong. During recovery acting == up. So if active.size() < pool.info.min_size, the result of the recovery will still be too low to allow I/O. This would be a very uncommon case plus it doesn't help us to prioritize doing that recovery. Also, recovery allows simultaneous operations because even if data hasn't transferred, those operations can be logged on all OSDs. I'm not sure how this interacts with pg log size limiting. If recovery is slow due to large objects and clients send lots of small operations, the count of log entries is going to go up. Will log trimming slow client requests, so the log doesn't grow unbounded?
When recovery is degraded, we don't differentiate whether or not there are at least min_size replicas of every object. In that case we would want a priority boost, but I'm not sure the best way to determine between the 2 cases. Also, as recovery proceeds the degraded will go down, but at the point that example 1 has 100 degraded like example 2 it is really 100 objects on OSD 1 and 50 objects on OSD 0 and OSD 2. So in actual fact 50 objects are still below min_size assuming we push the same object to both OSDs as recovery goes along.
Example 1, size 3 min_size 2 replicated pool
Should get a priority boost because for 100 objects has 200 degraded (below min_size with 100 replicas existing)
PG 1.0 [1, 0, 2] active+clean
OSDs 0 and 2 go down and out
PG 1.0 [1, 3, 4]. active+recovering+degraded
Example 2, size 3 min_size 2 replicated pool
No priority boost because 100 objects has 100 degraded (min_size 2 ok with 200 existing replicas)
PG 1.0 [1, 0, 2] active+clean
OSDs 2 go down and out
PG 1.0 [1, 0, 3]. active+recovering+degraded
Updated by David Zafman about 5 years ago
- Related to Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd added
Updated by David Zafman about 5 years ago
- Related to Feature #39339: prioritize backfill of metadata pools, automatically added
Updated by David Zafman almost 5 years ago
- Status changed from In Progress to Pending Backport
- Backport set to luminous, mimic, nautilus
Updated by David Zafman almost 5 years ago
- Precedes Bug #35808: ceph osd ok-to-stop result dosen't match the real situation added
Updated by Nathan Cutler almost 5 years ago
- Copied to Backport #39504: nautilus: Give recovery for inactive PGs a higher priority added
Updated by Nathan Cutler almost 5 years ago
- Copied to Backport #39505: luminous: Give recovery for inactive PGs a higher priority added
Updated by Nathan Cutler almost 5 years ago
- Copied to Backport #39506: mimic: Give recovery for inactive PGs a higher priority added
Updated by Neha Ojha almost 5 years ago
David, we should discuss whether we want to backport this all the way to luminous or just to nautilus.
Updated by David Zafman almost 5 years ago
- Related to Documentation #39011: Document how get_recovery_priority() and get_backfill_priority() impacts recovery order added
Updated by Nathan Cutler almost 5 years ago
- Has duplicate Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd added
Updated by Nathan Cutler almost 5 years ago
- Related to deleted (Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd)
Updated by David Zafman over 4 years ago
- Status changed from Pending Backport to Resolved
- Backport changed from luminous, mimic, nautilus to nautilus