Bug #62338
closedosd: choose_async_recovery_ec may select an acting set < min_size
100%
Description
choose_async_recovery_ec may remove OSDs from the acting set as long as PeeringState::recoverable evaluates to true. Prior to 90022b35 (merge of PR 17619), the condition was PeeringState::recoverable_and_ge_min_size which behaved as the name indicates. 7cb818a85 weakened the condition in PeeringState::recoverable_and_ge_min_size to only check min_size if !cct->_conf.get_val<bool>("osd_allow_recovery_below_min_size") (name was changed to PeeringState::recoverable in a subsequent commit in that PR e4c8bee88). PeeringState::recoverable_and_ge_min_size had (and has) two callers: choose_acting and choose_async_recovery_ec. For choose_acting, this change is correct. However, for choose_async_recovery_ec, we don't want to reduce the acting set size below min_size as it would prevent the PG doing IO during recovery.
The main observable symptom will be a PG that ends up in peered state during recovery (peered+recovering, peered+recovery_wait) unable to do IO until recovery completes although there are sufficient pretty much up-to-date osds.