Project

General

Profile

Bug #62338

Updated by Samuel Just 10 months ago

choose_async_recovery_ec may remove OSDs from the acting set as long as PeeringState::recoverable evaluates to true.    Prior to 90022b35 (merge of PR 17619), #17619), the condition was PeeringState::recoverable_and_ge_min_size which behaved as the name indicates.     7cb818a85 weakened the condition in PeeringState::recoverable_and_ge_min_size to only check min_size if !cct->_conf.get_val<bool>("osd_allow_recovery_below_min_size") (name was changed to PeeringState::recoverable in a subsequent commit in that PR e4c8bee88).    PeeringState::recoverable_and_ge_min_size had (and has) two callers: choose_acting and choose_async_recovery_ec.    For choose_acting, this change is correct.    However, for choose_async_recovery_ec, we don't want to reduce the acting set size below min_size as it would prevent the PG doing IO during recovery. 

 The main observable symptom will be a PG that ends up in peered state during recovery (peered+recovering, peered+recovery_wait) unable to do IO until recovery completes although there are sufficient pretty much up-to-date osds.

Back