Bug #21567: rbd does not delete snaps in (ec) data pool - rbd - Ceph

Actions

Copy link

Bug #21567

closed

rbd does not delete snaps in (ec) data pool

Added by Henrik Korkuc over 6 years ago. Updated over 5 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

Jason Dillaman

Target version:

% Done:

Source:

Tags:

Backport:

luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

After deleting RBD image snapshots space is not reclaimed. Reproduced with:
rbd-ec(id 1, EC 4+2) and rbd-meta (id 2, replicated). 6 1TB images mounted via NBD. large file is created on each image, snapshoted and then rewritten and snapshoted few times. After snapshot deletion no space is reclaimed.

Posting one of OSD logs with debug_osd 20/20 and debug_rbd 20/20 (I have rbd output with this debug level, can attach if needed)

ceph-post-file: 295eab8b-2a06-40ac-be7f-bed2e8ef64d6

Ceph version: 12.2.0-178-gba746cd (ba746cd14ddd70a4f24a734f83ff9d276dd327d1)

Cluster is idle, PGs do not change to snaptrim:

  data:
    pools:   2 pools, 2560 pgs
    objects: 7158k objects, 28571 GB
    usage:   43140 GB used, 218 TB / 260 TB avail
    pgs:     2560 active+clean

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Henrik Korkuc over 6 years ago

same issue on 12.2.1 too

Actions

Copy link

Updated by Sage Weil over 6 years ago

Project changed from Ceph to rbd
Subject changed from Luminous RBD does not trim PGs for EC data pool to rbd does not delete snaps in (ec) data pool
Category deleted (~~OSD~~)
Status changed from New to 12
Priority changed from Normal to Immediate

to reproduce,

bin/init-ceph stop ; MON=1 OSD=4 MDS=0 ../src/vstart.sh -d -n -x -l --bluestore -e ; bin/ceph osd pool create rbd 1
bin/ceph osd pool set ec allow_ec_overwrites true
bin/rbd import --path bin/ceph-mds --dest foo --data-pool ec
bin/rbd snap create --image foo --snap snap1
bin/rbd bench foo --io-size 1M --io-total 100M --io-type write --io-pattern rand
bin/rbd snap rm --image foo --snap snap1

the snap is only deleted in teh rbd pool, not the data pool

pool 1 'ec' erasure size 4 min_size 3 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 23 flags hashpspool,ec_overwrites stripe_width 8192
pool 2 'rbd' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 27 flags hashpspool stripe_width 0
        removed_snaps [1~9]

Actions

Copy link

Updated by Jason Dillaman over 6 years ago

Status changed from 12 to In Progress
Assignee set to Jason Dillaman

Actions

Copy link

Updated by Jason Dillaman over 6 years ago

Backport set to luminous

Actions

Copy link

Updated by Jason Dillaman over 6 years ago

Status changed from In Progress to Fix Under Review

PR: https://github.com/ceph/ceph/pull/18043

Actions

Copy link

Updated by Henrik Korkuc over 6 years ago

Not sure if it helps, but without knowing internals and looking at that patch I have impression that data pool snapshots were not created at all. But looking at pool size, it takes more space than images could take if there would be no snapshots (27TB vs 6TB in my case). So it looks like some kind of snapshots were created, but they weren't cleaned up. Also OSD logs have "Active: kicking snap trim" for data pool PGs too.

Actions

Copy link