Bug #61962
opentrim_maps - possible leak on `skip_maps`
0%
Description
Background:
OSD::trim_maps is trimming osdmaps from the superblock's oldest_map epoch up to the earlier between cluster trim lower bound or the osdmap cache's key lower bound (`min`).
When `skip_maps` is false, we will trim in small batches (`osd_target_transaction_size`). That said, the oldest_map may lag behind the trim_lower_bound for a while.
If `skip_maps` is true, trimming will occur unconditionally up to `min`. The target transaction size will not be taken into account.
Leak:
The leak can happen once `skip_maps` is true and we will move the oldest_map to `first` without actually trimming all the osdmaps between the current oldest_map epoch up to the `first` epoch of the MOSDMap message which is being handled.
oldest_map epoch is used to indicate the epoch which the last trimming has finished so we can continue trimming from this epoch later on (in the next trim_maps call).
The faulty trimming may occur when the `min` epoch is selected based on the osdmap cache lower bound (with `skip_maps`) and not based on the cluster trim lower bound.
For affected clusters:
trim_stale_maps command is introduced. See: https://github.com/ceph/ceph/pull/53227