Osd - prepopulate pg temp¶
Pre-populate the pg_temp mapping in the OSDMap when there are large changes in the CRUSH map.
- Sage Weil (Inktank)
- Guang Yang (Yahoo!)
- Name (Affiliation
Normally when there is a major change (like a CRUSH rule change, or reweighting of an entire rack), many PG primaries get remapped to devices that do not have the content, and each one sends a request to the monitor to add a pg_temp exception remapping to the previous location. This incurs a delay in availability, especially when there are many such PGs and a large number of messages the monitors have to process to add the remappings.
Detailed Description¶Instead of waiting for the OSDs to add an exception, we could (optionally) prepopulate pg_temp after a CRUSH map change. This minimizes (or eliminates) any lapse in availability (no i/o stalls) at the expense of monitor CPU utilization calculating the mappings.
- what triggers the mon to calculate pg mappings? pg_pool_t property change? CRUSH map change?
- how do we prevent that work from disrupting ongoing mon work?
- async worker thread that may/may not come back with useful work before the paxos round gets proposed?
- ensure that thrashosds.py is making changes that trigger said remapping
- mon: build predicate to determine when to calculate mappings
- add config options controlling this as appropriate
- mon: calculate mappings and pre-populate pg_temp
- mon: push calculation onto an async worker thread that can run in parallel with real work
Build / release tasks¶
- teuthology: ensure thrashosds exercises new feature