Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific - RADOS - Ceph

Actions

Copy link

Bug #56147

closed

snapshots will not be deleted after upgrade from nautilus to pacific

Added by Manuel Lausch almost 2 years ago. Updated about 1 year ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Matan Breizman

Category:

Snapshots

Target version:

% Done:

Source:

Tags:

backport_processed

Backport:

octopus,pacific,quincy

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

46908

Crash signature (v1):

Crash signature (v2):

Description

After upgrading from 14.2.22 to 16.2.9 snapshot deletion does not remove "clones" from pool
More precise: Objects in snapshots created with nautilus and deleted with pacific. New created snapshots working as expected.

The snapshot itself disappears in the snapshot list but the number of "clones" listed in rados df will not be cleared and the corresponding space will not be freed up.

Also I could this observer on a update from nautilus to ocotopus.

It doesn't matter if the omap conversion was done with the update or not (bluestore_fsck_quick_fix_on_mount)

Related issues 5 (0 open — 5 closed)

Actions

Copy link

Updated by Radoslaw Zarzynski almost 2 years ago

Status changed from New to Need More Info

Also I could this observer on a update from nautilus to ocotopus.

Just to ensure: am I correct the issue is visible also on an octopus cluster?

Also, having the logs with `debug_osd=20` would be really helpful.

Actions

Copy link

Updated by Manuel Lausch almost 2 years ago

Yes. For the debuglogs I tested this with nautilus (14.2.22) to octopus (15.2.16). The behavior is the same as described before with update von Nautilus to Pacific

debug logs are uploaded
ceph-post-file: 9a210755-bcc5-4b8b-bca5-45bf0071b142

The testcluster is a singlenode with 3 OSDs and a replicated pool with rf=3

on nautilus I created 4 objects
created a snapshot
delete one object

here the output from rados df so far:

POOL_NAME     USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS     RD  WR_OPS     WR  USED COMPR  UNDER COMPR
spielfeld  768 KiB        5       1      15                   0        0         0       1  1 KiB       7  5 KiB         0 B          0 B

total_objects    5
total_used       3.0 GiB
total_avail      11 TiB
total_space      11 TiB

now I updated to octopus and deleted the snapshot again
The rados df output is still the same:

POOL_NAME                 USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS      WR  USED COMPR  UNDER COMPR
device_health_metrics   57 KiB        3       0       9                   0        0         0       0  0 B       3  15 KiB         0 B          0 B
spielfeld              768 KiB        5       1      15                   0        0         0       0  0 B       7   5 KiB         0 B          0 B

total_objects    8
total_used       3.0 GiB
total_avail      11 TiB
total_space      11 TiB

Actions

Copy link

Updated by Manuel Lausch almost 2 years ago

It seems to be a failure on conversion after upgrade

in the omap dump before the update with one deleted object in a snapshot there are this two entries

_USER_0000000000000065_USER_    MAP_0000000000000001_0000000000000001.F13BC082.1.d1..
_USER_0000000000000065_USER_    OBJ_0000000000000001.F13BC082.1.d1..

After the update to octopus, the entries looks like this:

_USER_0000000000000065_USER_    OBJ_0000000000000001.F13BC082.1.d1..
_USER_0000000000000065_USER_    SNA_1_0000000000000001_

With a octopus deleted object, the entry looks like this:

_USER_0000000000000065_USER_    SNA_1_0000000000000001_0000000000000001.A5224EFB.1.d2..

In the debug output of the OSD there is only one log line, which says, there was one key converted. No more details.

Actions

Copy link

Updated by Radoslaw Zarzynski almost 2 years ago

Status changed from Need More Info to New
Assignee set to Matan Breizman

Hello Matan! Does this snapshot issue ring a bell?

Actions

Copy link

Updated by Manuel Lausch almost 2 years ago

Here I have a PR, which should fix the conversion on update
https://github.com/ceph/ceph/pull/46908

But what is with clusters already updated?

Actions

Copy link

Updated by Matan Breizman almost 2 years ago

Radoslaw Zarzynski wrote:

Hello Matan! Does this snapshot issue ring a bell?

Introduced here:
https://github.com/ceph/ceph/commit/94ebe0eab968068c29fdffa1bfe68c72122db633

Here I have a PR, which should fix the conversion on update
https://github.com/ceph/ceph/pull/46908

But what is with clusters already updated?

This patch may work when running Octopus for the first time.
As mentioned, we should also be able to convert already updated clusters.

Actions

Copy link

Updated by Radoslaw Zarzynski almost 2 years ago

Status changed from New to Fix Under Review
Pull request ID set to 46908

Actions

Copy link

Updated by Neha Ojha almost 2 years ago

Priority changed from Normal to Urgent
Backport set to octopus,pacific,quincy

Actions

Copy link

Updated by Neha Ojha almost 2 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#10

Updated by Backport Bot almost 2 years ago

Copied to Backport #56578: quincy: snapshots will not be deleted after upgrade from nautilus to pacific added

Actions

Copy link

#11

Updated by Backport Bot almost 2 years ago

Copied to Backport #56579: pacific: snapshots will not be deleted after upgrade from nautilus to pacific added

Actions

Copy link

#12

Updated by Backport Bot almost 2 years ago

Copied to Backport #56580: octopus: snapshots will not be deleted after upgrade from nautilus to pacific added

Actions

Copy link

#13

Updated by Matan Breizman almost 2 years ago

This issue is fixed (including a unit test) and will be backported in order to prevent future clusters upgrades from Nautilus (or earlier) to use the faulty conversion.

For already-converted clusters: Separate PR will be issued to remove/update the malformed SnapMapper keys.

Actions

Copy link

#14

Updated by Backport Bot over 1 year ago

Tags set to backport_processed

Actions

Copy link

#15

Updated by Stefan Kooman over 1 year ago

Is this bug also affecting rbd snapshots / clones?

Actions

Copy link

#16

Updated by Matan Breizman over 1 year ago

Stefan Kooman wrote:

Is this bug also affecting rbd snapshots / clones?

Yes

Actions

Copy link

#17

Updated by Matan Breizman over 1 year ago

Status changed from Pending Backport to Resolved

Actions

Copy link

#18

Updated by Matan Breizman over 1 year ago

For already-converted clusters: Separate PR will be issued to remove/update the malformed SnapMapper keys.

https://github.com/ceph/ceph/pull/47388

Actions

Copy link

#19

Updated by Wout van Heeswijk about 1 year ago

Matan Breizman wrote:

For already-converted clusters: Separate PR will be issued to remove/update the malformed SnapMapper keys.

https://github.com/ceph/ceph/pull/47388

This PR never got backported to any release. We suspect we may be suffering from corrupted snapshots due this bug. I've create backport request for the above pr. Parallel to this we are trying to gather the information to either prove or disprove the corruption relationship between the bug and our corruption case.

https://tracker.ceph.com/issues/59478

Actions

Copy link

#20

Updated by Konstantin Shalygin about 1 year ago

Related to Bug #59478: osd/scrub: verify SnapMapper consistency not backported added

Actions

Copy link

#21

Updated by Matan Breizman 8 months ago

Related to Bug #62596: osd: Remove leaked clone objects (SnapMapper malformed key) added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #56147

snapshots will not be deleted after upgrade from nautilus to pacific

Updated by Radoslaw Zarzynski almost 2 years ago

Updated by Manuel Lausch almost 2 years ago

Updated by Manuel Lausch almost 2 years ago

Updated by Radoslaw Zarzynski almost 2 years ago

Updated by Manuel Lausch almost 2 years ago

Updated by Matan Breizman almost 2 years ago

Updated by Radoslaw Zarzynski almost 2 years ago

Updated by Neha Ojha almost 2 years ago

Updated by Neha Ojha almost 2 years ago

Updated by Backport Bot almost 2 years ago

Updated by Backport Bot almost 2 years ago

Updated by Backport Bot almost 2 years ago

Updated by Matan Breizman almost 2 years ago

Updated by Backport Bot over 1 year ago

Updated by Stefan Kooman over 1 year ago

Updated by Matan Breizman over 1 year ago

Updated by Matan Breizman over 1 year ago

Updated by Matan Breizman over 1 year ago

Updated by Wout van Heeswijk about 1 year ago

Updated by Konstantin Shalygin about 1 year ago

Updated by Matan Breizman 8 months ago