Project

General

Profile

Osd - prepopulate pg temp » History » Version 1

Jessica Mack, 07/03/2015 08:43 PM

1 1 Jessica Mack
h1. Osd - prepopulate pg temp
2
3
h3. Summary
4
5
Pre-populate the pg_temp mapping in the OSDMap when there are large changes in the CRUSH map.
6
7
h3. Owners
8
9
* Sage Weil (Inktank)
10
11
h3. Interested Parties
12
13
* Guang Yang (Yahoo!)
14
* Name (Affiliation
15
* Name
16
17
h3. Current Status
18
19
Normally when there is a major change (like a CRUSH rule change, or reweighting of an entire rack), many PG primaries get remapped to devices that do not have the content, and each one sends a request to the monitor to add a pg_temp exception remapping to the previous location.  This incurs a delay in availability, especially when there are many such PGs and a large number of messages the monitors have to process to add the remappings.
20
21
h3. Detailed Description
22
23
Instead of waiting for the OSDs to add an exception, we could (optionally) prepopulate pg_temp after a CRUSH map change.  This minimizes (or eliminates) any lapse in availability (no i/o stalls) at the expense of monitor CPU utilization calculating the mappings.
24
Key considerations:
25
* what triggers the mon to calculate pg mappings?  pg_pool_t property change?  CRUSH map change?
26
* how do we prevent that work from disrupting ongoing mon work? 
27
** async worker thread that may/may not come back with useful work before the paxos round gets proposed?
28
* ensure that thrashosds.py is making changes that trigger said remapping
29
30
h3. Work items
31
32
h4. Coding tasks
33
34
# mon: build predicate to determine when to calculate mappings
35
## add config options controlling this as appropriate 
36
# mon: calculate mappings and pre-populate pg_temp
37
# mon: push calculation onto an async worker thread that can run in parallel with real work
38
39
h4. Build / release tasks
40
41
# teuthology: ensure thrashosds exercises new feature