Project

General

Profile

Bug #61962

Updated by Matan Breizman 8 months ago

*Background*: 

 OSD::trim_maps is trimming osdmaps from the superblock's oldest_map epoch up to the earlier between cluster trim lower bound or the osdmap cache's key lower bound (`min`). 
 When `skip_maps` is false, we will trim in small batches (`osd_target_transaction_size`). That said, the oldest_map may lag behind the trim_lower_bound for a while. 
 If `skip_maps` is true, trimming will occur unconditionally up to `min`. The target transaction size will not be taken into account. 

 *Leak*: 

 The leak can happen once `skip_maps` is true and we will move the oldest_map to `first` without actually trimming all the osdmaps between the _current_ oldest_map epoch up to the `first` epoch of the MOSDMap message which is being handled. 
 oldest_map epoch is used to indicate the epoch which the last trimming has finished so we can continue trimming from this epoch later on (in the next trim_maps call). 

 The faulty trimming may occur when the `min` epoch is selected based on the osdmap cache lower bound (with `skip_maps`) and not based on the cluster trim lower bound. 


 *For affected clusters*: 

 trim_stale_maps command is introduced. See: https://github.com/ceph/ceph/pull/53227 

Back