Bug #21412
closed
cephfs: too many cephfs snapshots chokes the system
Added by Wyllys Ingersoll over 6 years ago.
Updated about 6 years ago.
Category:
Correctness/Safety
Description
We have a cluster with /cephfs/.snap directory with over 4800 entries. Trying to delete older snapshots (some are over 6 months old on a pretty active file system) causes the "rmdir" command to hang, as well as any future operations on the .snap directory (such as 'ls'). Also, it is causing the number of blocked requests to grow indefinitely.
Ceph 10.2.7
Ubuntu 16.04.2
Kernel: 4.9.10
Files
- Project changed from Ceph to CephFS
- Category changed from 129 to 89
Can you dump the ops in flight on both the MDS and the client issuing the snap rmdir when this happens? And the perfcounters on the MDS?
My blind guess about what's blocking this is actually not snapshot trimming, but if the queue for deleting inodes (or one of the directory fragments, as you're on Jewel) is at its max size.
Greg Farnum wrote:
Can you dump the ops in flight on both the MDS and the client issuing the snap rmdir when this happens? And the perfcounters on the MDS?
My blind guess about what's blocking this is actually not snapshot trimming, but if the queue for deleting inodes (or one of the directory fragments, as you're on Jewel) is at its max size.
What command(s) should I use to capture that info?
- Assignee changed from Jos Collin to Zheng Yan
ceph-mds.mds01.log.gz does not include useful information. The log was generated when mds replays log. Maybe the hang was caused by mds crash. does restarting mds resolve the hang?
ceph daemon mds.<name> dump_ops_in_flight
ceph daemon mds.<name> perf dump
Thanks. Im hesitant to trigger the issue again, last time it threw my cluster into major chaos that took several days to recover. Once I get data off of it, I will trigger the issue again and capture the info that you need.
Here is data collected from a recent attempt to delete a very old and very large snapshot:
The snapshot extended attributes looks like:
- file: cephfs/.snap/snapshot.2017-02-24_22_17_01-1487992621
ceph.dir.entries="3"
ceph.dir.files="0"
ceph.dir.rbytes="30500769204664"
ceph.dir.rctime="1504695439.09966088000"
ceph.dir.rentries="7802785"
ceph.dir.rfiles="7758691"
ceph.dir.rsubdirs="44094"
ceph.dir.subdirs="3"
ops in flight during the deletion looks like:
{
"ops": [],
"num_ops": 0
}
The problem is that it takes almost 24 hours to delete a single snapshot and it puts the cluster into a warning state whenever it is happening.
Is there a quicker "backdoor" way to purge our snapshots without blowing up the cluster? We really want to clean it up and get it back to a more usable state. At the current rate, it will literally take almost 13 YEARS to clean up the snapshots. Our only other alternative at this point is to destroy the entire filesystem and re-create it and then restore all of the data that was on it (we already backed it up, which took over a week).
Here is a dump of the cephfs 'dentry_lru' table, in case it is interesting.
Note, the bug says "10.2.7" but we have since upgraded to 10.2.9 and the same problem exists.
what do you mean "it takes almost 24 hours to delete a single snapshot"? 'rmdir .snap/xxx' tooks 24 hours or pgs on trimsnap states for 24 hours?
The trimsnap states. The rmdir actually completes quickly, but the resulting operations throw the entire cluster into massive recovery storm that can takes days to recover from.
- Subject changed from too many cephfs snapshots chokes the system to cephfs: too many cephfs snapshots chokes the system
- Category changed from 89 to Correctness/Safety
- Priority changed from Normal to Urgent
- Target version changed from v10.2.10 to v13.0.0
- Source set to Community (user)
- Tags set to snaps
- Release deleted (
jewel)
- Affected Versions deleted (
v10.2.7)
- Component(FS) MDS added
Zheng, is this issue resolved with the snapshot changes for Mimic?
this is actually osd issue. I talk to josh at cephalocon. He said it has already been fixed
- Status changed from New to Closed
Also available in: Atom
PDF