Project

General

Profile

Actions

Bug #62596

closed

osd: Remove leaked clone objects (SnapMapper malformed key)

Added by Matan Breizman 8 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Clusters affected by the SnapMapper malformed key conversion [1] (which was fixed) may still suffer from space leak caused by stale clone objects.
The leak may occur in the following scenario:

A cluster which had snapshots taken and was updated from N (and earlier) to O (up to 16.2.11 - before the fix [2] was merged).
If one of the snapshots which were taken before the update is removed, the clone objects of this snapshot will become stale.
Note: Even if non of the snapshots were removed yet, the key is still malformed and any future removal of this snapshot will cause the same effect. (Unless the SnapMapper key is fixed).


The fix for the affected clusters includes a 2-step procedure:

1) Fixing the key

This can be achieved in 2 ways:
Q and later releases: Scrub will remove and fix the corrupted keys to the correct structure [3].

Note: Currently, no Pacific backport is planned since there is an alternative solution which is available for this step.
This may be changed and will be finally decided before P final release.

P and later releases: Re-deploying the affected OSDs. Once the OSD is redeployed - the keys will be recreated correctly.

2) Removing the stale objects

In order the remove the stale clone objects, the removed (purged) snapshot should be re-removed once the SnapMapper key is valid.
A purged_snaps_scrub occurs in the background every deep scrub interval which will handle the snapshot re-removal.
The scrub_purged_snaps can also be called using an osd asock command without waiting for next deep scrub interval.

ceph daemon osd.<id> scrub_purged_snaps


The last_scrub_purged_snaps timestamp is part of the OSDSuperblock and can be obtained using the ceph-objectstore-tool:

ceph-objectstore-tool --data-path <store_path> --op dump-super | grep last_purged_snaps_scrub


To verify if a cluster is affected, malformed keys can be identified using the `ceph-kvstore-tool`.
The following command can be run offline only:

ceph-kvstore-tool bluestore-kv <store-path> list p |  grep 'SNA.*_$'

[1] https://tracker.ceph.com/issues/56147
[2] https://github.com/ceph/ceph/pull/46908
[3] https://github.com/ceph/ceph/pull/47388


Related issues 2 (0 open2 closed)

Related to RADOS - Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacificResolvedMatan Breizman

Actions
Follows RADOS - Bug #59478: osd/scrub: verify SnapMapper consistency not backportedClosedRonen Friedman

Actions
Actions #1

Updated by Matan Breizman 8 months ago

  • Description updated (diff)
Actions #2

Updated by Matan Breizman 8 months ago

  • Follows Bug #59478: osd/scrub: verify SnapMapper consistency not backported added
Actions #3

Updated by Matan Breizman 8 months ago

  • Related to Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific added
Actions #4

Updated by Matan Breizman 8 months ago

  • Description updated (diff)
Actions #5

Updated by Matan Breizman 8 months ago

  • Description updated (diff)
  • Status changed from New to Closed
Actions

Also available in: Atom PDF