Project

General

Profile

Actions

Bug #52581

open

Dangling fs snapshots on data pool after change of directory layout

Added by Frank Schilder over 2 years ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds, snapshots
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

# ceph version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)

After changing the data pool on the root directory of our ceph fs we seem to have deleted snapshots stuck in the new data pool. We are rotating daily snapshots. Our ceph fs status excluding stand-bys is

# ceph fs status
con-fs2 - 1640 clients
=======
+------+--------+---------+---------------+-------+-------+
| Rank | State  |   MDS   |    Activity   |  dns  |  inos |
+------+--------+---------+---------------+-------+-------+
|  0   | active | ceph-23 | Reqs:    5 /s | 2399k | 2346k |
|  1   | active | ceph-12 | Reqs:   25 /s | 1225k | 1203k |
|  2   | active | ceph-08 | Reqs:   25 /s | 2148k | 2027k |
|  3   | active | ceph-15 | Reqs:   26 /s | 2088k | 2032k |
+------+--------+---------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
|    con-fs2-meta1    | metadata | 4040M | 1314G |
|    con-fs2-meta2    |   data   |    0  | 1314G |
|     con-fs2-data    |   data   | 1361T | 6023T |
| con-fs2-data-ec-ssd |   data   |  239G | 4205G |
|    con-fs2-data2    |   data   | 35.8T | 5475T |
+---------------------+----------+-------+-------+

We changed the data pool on the root from the 8+2 EC pool con-fs2-data to the 8+3 EC pool con-fs2-data2. It looks like on the new pool some deleted snapshots are not purged (snippet from ceph osd pool ls detail):

pool 12 'con-fs2-meta1' replicated size 4 min_size 2 ... application cephfs
pool 13 'con-fs2-meta2' replicated size 4 min_size 2 ... application cephfs
    removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
pool 14 'con-fs2-data' erasure size 10 min_size 9 ... application cephfs
    removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
pool 17 'con-fs2-data-ec-ssd' erasure size 10 min_size 9 ... application cephfs
    removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
pool 19 'con-fs2-data2' erasure size 11 min_size 9 ... application cephfs
    removed_snaps [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]

The problematic snapshots are the ones still present in pool con-fs2-data2 in the set [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3], which should not be present. They correspond to decimal snap IDs 727 729 731 733 735 737 739 741 743 745 747. All mds daemons report these snap IDs:

# ceph daemon mds.ceph-23 dump snaps | grep snapid
            "snapid": 400,
            "snapid": 445,
            "snapid": 770,
            "snapid": 774,
            "snapid": 776,
            "snapid": 778,
            "snapid": 780,
            "snapid": 782,
            "snapid": 784,
            "snapid": 786,
            "snapid": 788,
            "snapid": 791,

These extra snapshots seem to cause performance issues and I would like to know how to get rid of them.

Actions #2

Updated by Patrick Donnelly over 2 years ago

  • Status changed from New to Triaged
  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version set to v17.0.0
  • Source set to Community (user)
Actions #3

Updated by Kotresh Hiremath Ravishankar over 2 years ago

Hi Frank,

I tried reproducing this issue on the master by changing root data pool to the new one but couldn't achieve it.
Is it possible for you to share the ceph logs around the time of snapshot creation (which are dangling) and deletion of the same?

Thanks,
Kotresh HR

Actions #4

Updated by Kotresh Hiremath Ravishankar over 2 years ago

The ceph version of this issue 'ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)' is no longer supported. Please upgrade to newer stable supported version and report us back if the issue is still seen.

Actions #5

Updated by Kotresh Hiremath Ravishankar over 2 years ago

  • Status changed from Triaged to Can't reproduce

Closing this issue based on previous comment. Please re-open the issue if it happens on the supported versions.

Actions #6

Updated by Frank Schilder over 2 years ago

Please don't close an issue without providing an actual fix, that you can't reproduce it with a simple test doesn't mean its not there. The issue was observed with mimic and, unless there is a confirmed fix, affected versions is "mimic and newer" and status is "open". In the meantime, we have to wait until someone figures out how to reproduce the problem reliably, or observes this as well and can add relevant information.

Logs

I secured all logs, but I'm afraid they don't go back long enough. The logs I have seem to start shortly after the issue occurred. Coincidentally, the logs before the oldest available one must have been really large. Our log rotation is set up to roll-over every week or if size>=100M. Usually, the size limit is large enough to cover a week, but in this particular time period, the cluster log contains only 1 day.

There were no special messages shown in the dashboard, so all this volume must have been low-level messages that go to the log only. Since nothing special happened during this time (well, seemed to), we didn't look at the raw logs. What I have is the following:

- New data pool added on Aug 25

2021-08-25 13:15:02.438990 mon.ceph-01 mon.0 192.168.32.65:6789/0 92502 : audit [INF] from='client.? 192.168.32.64:0/1689304540' entity='client.admin' cmd='[{"prefix": "fs add_data_pool", "fs_name": "con-fs2", "pool": "con-fs2-data2"}]': finished

- Oldest message in secured ceph.log on

2021-09-04 04:02:02.431988 mgr.ceph-01 mgr.45059212 192.168.32.65:0/63 222067 : cluster [DBG] pgmap v198863: 15326 pgs: 15308 active+clean, 18 active+clean+scrubbing+deep; 1.5 PiB data, 1.8 PiB used, 9.6 PiB / 11 PiB avail; 231 MiB/s rd, 173 MiB/s wr, 4.70 kop/s

There are 9 days in between for which we don't have the logs any more. 8 out of the 20 large omap objects were discovered during these 9 days. These objects belong to snapshots that have been removed from the file system.

I'm happy to provide any of these logs if it still makes sense.

Reproducibility

I don't think it is possible to reproduce this issue easily with a test cluster. We have several hundred users on the file system and all the large omap objects belong to a single user who was running a very particular workload. The workload almost certainly caused dirfrag operations where part of a directory was on the old and another was already on the new data pool. The issue seems only present in an intermittent period where this special workload started before the new data pool was added, and finished after the addition. For the same type of jobs started after the pool change no new orphaned snapshots show(ed) up (so far).

Possible way forward to work around the problem

I have large omap warnings linking to directories in orphaned snapshots. It should be possible to figure out their location before the snapshot was deleted or the user ID of the owner. Then, we can find out what workload was executed during the critical period and try this on a test cluster. There might also a manual way for deleting the orphaned snapshots. It must be possible to get rid of these.

Actions #7

Updated by Frank Schilder over 2 years ago

Sorry, just noticed status is set to "Can't reproduce". This is OK.

I would like to help building a reproducer. For this I need the information mentioned above, how to find the original full path and/or the user ID of the owner.

Actions #8

Updated by Venky Shankar 5 months ago

  • Category set to Correctness/Safety
  • Status changed from Can't reproduce to New
  • Target version changed from v17.0.0 to v19.0.0
  • Backport set to quincy,reef
  • Severity changed from 3 - minor to 2 - major
  • Component(FS) MDS added
  • Labels (FS) multimds, snapshots added

Bumping the prio since a similar report has showed up in -users list - https://marc.info/?l=ceph-users&m=170151942902049&w=2

The ceph version there is 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) which is pretty recent - we need an RCA.

Actions

Also available in: Atom PDF