Bug #62596
closedosd: Remove leaked clone objects (SnapMapper malformed key)
0%
Description
Clusters affected by the SnapMapper malformed key conversion [1] (which was fixed) may still suffer from space leak caused by stale clone objects.
The leak may occur in the following scenario:
A cluster which had snapshots taken and was updated from N (and earlier) to O (up to 16.2.11 - before the fix [2] was merged).
If one of the snapshots which were taken before the update is removed, the clone objects of this snapshot will become stale.
Note: Even if non of the snapshots were removed yet, the key is still malformed and any future removal of this snapshot will cause the same effect. (Unless the SnapMapper key is fixed).
The fix for the affected clusters includes a 2-step procedure:
1) Fixing the key¶
This can be achieved in 2 ways:
Q and later releases: Scrub will remove and fix the corrupted keys to the correct structure [3].
Note: Currently, no Pacific backport is planned since there is an alternative solution which is available for this step.
This may be changed and will be finally decided before P final release.
P and later releases: Re-deploying the affected OSDs. Once the OSD is redeployed - the keys will be recreated correctly.
2) Removing the stale objects¶
In order the remove the stale clone objects, the removed (purged) snapshot should be re-removed once the SnapMapper key is valid.
A purged_snaps_scrub occurs in the background every deep scrub interval which will handle the snapshot re-removal.
The scrub_purged_snaps can also be called using an osd asock command without waiting for next deep scrub interval.
ceph daemon osd.<id> scrub_purged_snaps
The last_scrub_purged_snaps timestamp is part of the OSDSuperblock and can be obtained using the ceph-objectstore-tool:
ceph-objectstore-tool --data-path <store_path> --op dump-super | grep last_purged_snaps_scrub
To verify if a cluster is affected, malformed keys can be identified using the `ceph-kvstore-tool`.
The following command can be run offline only:
ceph-kvstore-tool bluestore-kv <store-path> list p | grep 'SNA.*_$'
[1] https://tracker.ceph.com/issues/56147
[2] https://github.com/ceph/ceph/pull/46908
[3] https://github.com/ceph/ceph/pull/47388
Updated by Matan Breizman 8 months ago
- Follows Bug #59478: osd/scrub: verify SnapMapper consistency not backported added
Updated by Matan Breizman 8 months ago
- Related to Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific added
Updated by Matan Breizman 8 months ago
- Description updated (diff)
- Status changed from New to Closed