Project

General

Profile

Actions

Bug #61962

open

trim_maps - possible leak on `skip_maps`

Added by Matan Breizman 10 months ago. Updated 7 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Background:

OSD::trim_maps is trimming osdmaps from the superblock's oldest_map epoch up to the earlier between cluster trim lower bound or the osdmap cache's key lower bound (`min`).
When `skip_maps` is false, we will trim in small batches (`osd_target_transaction_size`). That said, the oldest_map may lag behind the trim_lower_bound for a while.
If `skip_maps` is true, trimming will occur unconditionally up to `min`. The target transaction size will not be taken into account.

Leak:

The leak can happen once `skip_maps` is true and we will move the oldest_map to `first` without actually trimming all the osdmaps between the current oldest_map epoch up to the `first` epoch of the MOSDMap message which is being handled.
oldest_map epoch is used to indicate the epoch which the last trimming has finished so we can continue trimming from this epoch later on (in the next trim_maps call).

The faulty trimming may occur when the `min` epoch is selected based on the osdmap cache lower bound (with `skip_maps`) and not based on the cluster trim lower bound.

For affected clusters:

trim_stale_maps command is introduced. See: https://github.com/ceph/ceph/pull/53227


Related issues 2 (2 open0 closed)

Copied to RADOS - Backport #63464: reef: trim_maps - possible leak on `skip_maps`NewMatan BreizmanActions
Copied to RADOS - Backport #63465: quincy: trim_maps - possible leak on `skip_maps`NewMatan BreizmanActions
Actions

Also available in: Atom PDF