Bug #18352: discarding / unmapping / trimming blocks on an image that has snapshots INCREASES disk usage instead of reducing. - rbd - Ceph

Actions

Copy link

Bug #18352

closed

discarding / unmapping / trimming blocks on an image that has snapshots INCREASES disk usage instead of reducing.

Added by Марк Коренберг over 7 years ago. Updated over 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

rbd

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

How to reproduce:

1. Create image, connect to VM and install any OS. Ensure that KVM uses discard feature and virtio-scsi driver.
2. Run `fstrim --all` in guest OS and ensure that disk usage in reduced in Ceph (i.e. through `rbd du` command)
3. Create a snapshot of RBD image.
4. Remove some files inside guest image
5. Run `fstrim --all` again.
6. See that amount of used space is actually increases for that image!

I suggest, that this happen due to logic that writes (allocates) zeroes instead of unmapping regions, while discarding regions on images, that have snapshot.

There are three solutions for that (we should choose one of them):

1. Just decrement reference count on that region, unmap this region form RBD image. Exactly as it happens for image without snapshots. I don't understand why this was not already done. (Preferred)
2. Just remove any records about that region, so it will refer to original region of base image (if it was allocated before snapshotting). In that case, reading from discarded region will return some old data. It is allowed for SSD for example. (Very simple, but not friendly for some usage, see comments below)
3. Introduce `whiteout` flag in RBD metadata for case when region [that was allocated in base image] is discarded.

AFAIK, there is flag for SCSI device specifying if discarded regions will return zeroes on reading.

This happens on Kraken (server) and jewel on client.

Actions

Copy link

Updated by Марк Коренберг over 7 years ago

about flag: see https://bugs.launchpad.net/qemu/+bug/1652459

Actions

Copy link

Updated by Марк Коренберг over 7 years ago

Found also that: https://www.spinics.net/lists/ceph-devel/msg30903.html

Actions

Copy link

Updated by Марк Коренберг over 7 years ago

Updated by Jason Dillaman over 7 years ago

Status changed from New to Need More Info

@Марк: can you set "rbd skip partial discard = true" in your hypervisor host's ceph.conf, configure QEMU's discard granularity to the backing RBD image's object size via "discard_granularity=XYZ" [1], and retest? Your suggestions don't really map to how Ceph/RBD are actually architected, but if you discard a full backing object (defaults to 4MB), zeroes won't be written and the backing object will be deleted / unreferenced at the HEAD revision of the object.

[1] http://docs.ceph.com/docs/giant/rbd/qemu-rbd/#enabling-discard-trim

Actions

Copy link

Updated by Jason Dillaman over 6 years ago

Status changed from Need More Info to Closed

Closing due to lack of feedback

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #18352

discarding / unmapping / trimming blocks on an image that has snapshots INCREASES disk usage instead of reducing.

Updated by Марк Коренберг over 7 years ago

Updated by Марк Коренберг over 7 years ago

Updated by Марк Коренберг over 7 years ago

Updated by Jason Dillaman over 7 years ago

Updated by Jason Dillaman over 6 years ago