Bug #52581
openDangling fs snapshots on data pool after change of directory layout
0%
Description
# ceph version ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
After changing the data pool on the root directory of our ceph fs we seem to have deleted snapshots stuck in the new data pool. We are rotating daily snapshots. Our ceph fs status excluding stand-bys is
# ceph fs status con-fs2 - 1640 clients ======= +------+--------+---------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+---------+---------------+-------+-------+ | 0 | active | ceph-23 | Reqs: 5 /s | 2399k | 2346k | | 1 | active | ceph-12 | Reqs: 25 /s | 1225k | 1203k | | 2 | active | ceph-08 | Reqs: 25 /s | 2148k | 2027k | | 3 | active | ceph-15 | Reqs: 26 /s | 2088k | 2032k | +------+--------+---------+---------------+-------+-------+ +---------------------+----------+-------+-------+ | Pool | type | used | avail | +---------------------+----------+-------+-------+ | con-fs2-meta1 | metadata | 4040M | 1314G | | con-fs2-meta2 | data | 0 | 1314G | | con-fs2-data | data | 1361T | 6023T | | con-fs2-data-ec-ssd | data | 239G | 4205G | | con-fs2-data2 | data | 35.8T | 5475T | +---------------------+----------+-------+-------+
We changed the data pool on the root from the 8+2 EC pool con-fs2-data to the 8+3 EC pool con-fs2-data2. It looks like on the new pool some deleted snapshots are not purged (snippet from ceph osd pool ls detail):
pool 12 'con-fs2-meta1' replicated size 4 min_size 2 ... application cephfs pool 13 'con-fs2-meta2' replicated size 4 min_size 2 ... application cephfs removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2] pool 14 'con-fs2-data' erasure size 10 min_size 9 ... application cephfs removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2] pool 17 'con-fs2-data-ec-ssd' erasure size 10 min_size 9 ... application cephfs removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2] pool 19 'con-fs2-data2' erasure size 11 min_size 9 ... application cephfs removed_snaps [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
The problematic snapshots are the ones still present in pool con-fs2-data2 in the set [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3], which should not be present. They correspond to decimal snap IDs 727 729 731 733 735 737 739 741 743 745 747. All mds daemons report these snap IDs:
# ceph daemon mds.ceph-23 dump snaps | grep snapid "snapid": 400, "snapid": 445, "snapid": 770, "snapid": 774, "snapid": 776, "snapid": 778, "snapid": 780, "snapid": 782, "snapid": 784, "snapid": 786, "snapid": 788, "snapid": 791,
These extra snapshots seem to cause performance issues and I would like to know how to get rid of them.