Bug #52581: Dangling fs snapshots on data pool after change of directory layout - CephFS - Ceph

Actions

Copy link

Bug #52581

open

Dangling fs snapshots on data pool after change of directory layout

Added by Frank Schilder over 2 years ago. Updated 5 months ago.

Status:

New

Priority:

Normal

Assignee:

Kotresh Hiremath Ravishankar

Category:

Correctness/Safety

Target version:

Ceph - v19.0.0

% Done:

Source:

Community (user)

Tags:

Backport:

quincy,reef

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

multimds, snapshots

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

# ceph version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)

After changing the data pool on the root directory of our ceph fs we seem to have deleted snapshots stuck in the new data pool. We are rotating daily snapshots. Our ceph fs status excluding stand-bys is

# ceph fs status
con-fs2 - 1640 clients
=======
+------+--------+---------+---------------+-------+-------+
| Rank | State  |   MDS   |    Activity   |  dns  |  inos |
+------+--------+---------+---------------+-------+-------+
|  0   | active | ceph-23 | Reqs:    5 /s | 2399k | 2346k |
|  1   | active | ceph-12 | Reqs:   25 /s | 1225k | 1203k |
|  2   | active | ceph-08 | Reqs:   25 /s | 2148k | 2027k |
|  3   | active | ceph-15 | Reqs:   26 /s | 2088k | 2032k |
+------+--------+---------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
|    con-fs2-meta1    | metadata | 4040M | 1314G |
|    con-fs2-meta2    |   data   |    0  | 1314G |
|     con-fs2-data    |   data   | 1361T | 6023T |
| con-fs2-data-ec-ssd |   data   |  239G | 4205G |
|    con-fs2-data2    |   data   | 35.8T | 5475T |
+---------------------+----------+-------+-------+

We changed the data pool on the root from the 8+2 EC pool con-fs2-data to the 8+3 EC pool con-fs2-data2. It looks like on the new pool some deleted snapshots are not purged (snippet from ceph osd pool ls detail):

pool 12 'con-fs2-meta1' replicated size 4 min_size 2 ... application cephfs
pool 13 'con-fs2-meta2' replicated size 4 min_size 2 ... application cephfs
    removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
pool 14 'con-fs2-data' erasure size 10 min_size 9 ... application cephfs
    removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
pool 17 'con-fs2-data-ec-ssd' erasure size 10 min_size 9 ... application cephfs
    removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
pool 19 'con-fs2-data2' erasure size 11 min_size 9 ... application cephfs
    removed_snaps [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]

The problematic snapshots are the ones still present in pool con-fs2-data2 in the set [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3], which should not be present. They correspond to decimal snap IDs 727 729 731 733 735 737 739 741 743 745 747. All mds daemons report these snap IDs:

# ceph daemon mds.ceph-23 dump snaps | grep snapid
            "snapid": 400,
            "snapid": 445,
            "snapid": 770,
            "snapid": 774,
            "snapid": 776,
            "snapid": 778,
            "snapid": 780,
            "snapid": 782,
            "snapid": 784,
            "snapid": 786,
            "snapid": 788,
            "snapid": 791,

These extra snapshots seem to cause performance issues and I would like to know how to get rid of them.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #52581

Dangling fs snapshots on data pool after change of directory layout

Updated by Frank Schilder over 2 years ago

Updated by Patrick Donnelly over 2 years ago

Updated by Kotresh Hiremath Ravishankar over 2 years ago

Updated by Kotresh Hiremath Ravishankar over 2 years ago

Updated by Kotresh Hiremath Ravishankar over 2 years ago

Updated by Frank Schilder over 2 years ago

Logs¶

Reproducibility¶

Possible way forward to work around the problem¶

Updated by Frank Schilder over 2 years ago

Updated by Venky Shankar 5 months ago