Bug #56147
closedsnapshots will not be deleted after upgrade from nautilus to pacific
0%
Description
After upgrading from 14.2.22 to 16.2.9 snapshot deletion does not remove "clones" from pool
More precise: Objects in snapshots created with nautilus and deleted with pacific. New created snapshots working as expected.
The snapshot itself disappears in the snapshot list but the number of "clones" listed in rados df will not be cleared and the corresponding space will not be freed up.
Also I could this observer on a update from nautilus to ocotopus.
It doesn't matter if the omap conversion was done with the update or not (bluestore_fsck_quick_fix_on_mount)
Updated by Radoslaw Zarzynski almost 2 years ago
- Status changed from New to Need More Info
Also I could this observer on a update from nautilus to ocotopus.
Just to ensure: am I correct the issue is visible also on an octopus cluster?
Also, having the logs with `debug_osd=20` would be really helpful.
Updated by Manuel Lausch almost 2 years ago
Yes. For the debuglogs I tested this with nautilus (14.2.22) to octopus (15.2.16). The behavior is the same as described before with update von Nautilus to Pacific
debug logs are uploaded
ceph-post-file: 9a210755-bcc5-4b8b-bca5-45bf0071b142
The testcluster is a singlenode with 3 OSDs and a replicated pool with rf=3
on nautilus I created 4 objects
created a snapshot
delete one object
here the output from rados df so far:
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR spielfeld 768 KiB 5 1 15 0 0 0 1 1 KiB 7 5 KiB 0 B 0 B total_objects 5 total_used 3.0 GiB total_avail 11 TiB total_space 11 TiB
now I updated to octopus and deleted the snapshot again
The rados df output is still the same:
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR device_health_metrics 57 KiB 3 0 9 0 0 0 0 0 B 3 15 KiB 0 B 0 B spielfeld 768 KiB 5 1 15 0 0 0 0 0 B 7 5 KiB 0 B 0 B total_objects 8 total_used 3.0 GiB total_avail 11 TiB total_space 11 TiB
Updated by Manuel Lausch almost 2 years ago
It seems to be a failure on conversion after upgrade
in the omap dump before the update with one deleted object in a snapshot there are this two entries
_USER_0000000000000065_USER_ MAP_0000000000000001_0000000000000001.F13BC082.1.d1.. _USER_0000000000000065_USER_ OBJ_0000000000000001.F13BC082.1.d1..
After the update to octopus, the entries looks like this:
_USER_0000000000000065_USER_ OBJ_0000000000000001.F13BC082.1.d1.. _USER_0000000000000065_USER_ SNA_1_0000000000000001_
With a octopus deleted object, the entry looks like this:
_USER_0000000000000065_USER_ SNA_1_0000000000000001_0000000000000001.A5224EFB.1.d2..
In the debug output of the OSD there is only one log line, which says, there was one key converted. No more details.
Updated by Radoslaw Zarzynski almost 2 years ago
- Status changed from Need More Info to New
- Assignee set to Matan Breizman
Hello Matan! Does this snapshot issue ring a bell?
Updated by Manuel Lausch almost 2 years ago
Here I have a PR, which should fix the conversion on update
https://github.com/ceph/ceph/pull/46908
But what is with clusters already updated?
Updated by Matan Breizman almost 2 years ago
Radoslaw Zarzynski wrote:
Hello Matan! Does this snapshot issue ring a bell?
Introduced here:
https://github.com/ceph/ceph/commit/94ebe0eab968068c29fdffa1bfe68c72122db633
Here I have a PR, which should fix the conversion on update
https://github.com/ceph/ceph/pull/46908But what is with clusters already updated?
This patch may work when running Octopus for the first time.
As mentioned, we should also be able to convert already updated clusters.
Updated by Radoslaw Zarzynski almost 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 46908
Updated by Neha Ojha almost 2 years ago
- Priority changed from Normal to Urgent
- Backport set to octopus,pacific,quincy
Updated by Neha Ojha almost 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56578: quincy: snapshots will not be deleted after upgrade from nautilus to pacific added
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56579: pacific: snapshots will not be deleted after upgrade from nautilus to pacific added
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56580: octopus: snapshots will not be deleted after upgrade from nautilus to pacific added
Updated by Matan Breizman almost 2 years ago
This issue is fixed (including a unit test) and will be backported in order to prevent future clusters upgrades from Nautilus (or earlier) to use the faulty conversion.
For already-converted clusters: Separate PR will be issued to remove/update the malformed SnapMapper keys.
Updated by Stefan Kooman over 1 year ago
Is this bug also affecting rbd snapshots / clones?
Updated by Matan Breizman over 1 year ago
Stefan Kooman wrote:
Is this bug also affecting rbd snapshots / clones?
Yes
Updated by Matan Breizman over 1 year ago
- Status changed from Pending Backport to Resolved
Updated by Matan Breizman over 1 year ago
For already-converted clusters: Separate PR will be issued to remove/update the malformed SnapMapper keys.
Updated by Wout van Heeswijk about 1 year ago
Matan Breizman wrote:
For already-converted clusters: Separate PR will be issued to remove/update the malformed SnapMapper keys.
This PR never got backported to any release. We suspect we may be suffering from corrupted snapshots due this bug. I've create backport request for the above pr. Parallel to this we are trying to gather the information to either prove or disprove the corruption relationship between the bug and our corruption case.
Updated by Konstantin Shalygin about 1 year ago
- Related to Bug #59478: osd/scrub: verify SnapMapper consistency not backported added
Updated by Matan Breizman 8 months ago
- Related to Bug #62596: osd: Remove leaked clone objects (SnapMapper malformed key) added