Bug #21412
cephfs: too many cephfs snapshots chokes the system
0%
Description
We have a cluster with /cephfs/.snap directory with over 4800 entries. Trying to delete older snapshots (some are over 6 months old on a pretty active file system) causes the "rmdir" command to hang, as well as any future operations on the .snap directory (such as 'ls'). Also, it is causing the number of blocked requests to grow indefinitely.
Ceph 10.2.7
Ubuntu 16.04.2
Kernel: 4.9.10
History
#1 Updated by Greg Farnum over 5 years ago
- Project changed from Ceph to CephFS
- Category changed from 129 to 89
Can you dump the ops in flight on both the MDS and the client issuing the snap rmdir when this happens? And the perfcounters on the MDS?
My blind guess about what's blocking this is actually not snapshot trimming, but if the queue for deleting inodes (or one of the directory fragments, as you're on Jewel) is at its max size.
#2 Updated by Wyllys Ingersoll over 5 years ago
Greg Farnum wrote:
Can you dump the ops in flight on both the MDS and the client issuing the snap rmdir when this happens? And the perfcounters on the MDS?
My blind guess about what's blocking this is actually not snapshot trimming, but if the queue for deleting inodes (or one of the directory fragments, as you're on Jewel) is at its max size.
What command(s) should I use to capture that info?
#3 Updated by Zheng Yan over 5 years ago
- Assignee changed from Jos Collin to Zheng Yan
#4 Updated by Zheng Yan over 5 years ago
ceph-mds.mds01.log.gz does not include useful information. The log was generated when mds replays log. Maybe the hang was caused by mds crash. does restarting mds resolve the hang?
#5 Updated by Greg Farnum over 5 years ago
ceph daemon mds.<name> dump_ops_in_flight
ceph daemon mds.<name> perf dump
#6 Updated by Wyllys Ingersoll over 5 years ago
Thanks. Im hesitant to trigger the issue again, last time it threw my cluster into major chaos that took several days to recover. Once I get data off of it, I will trigger the issue again and capture the info that you need.
#7 Updated by Wyllys Ingersoll over 5 years ago
- File perf_dump.after.txt View added
Here is data collected from a recent attempt to delete a very old and very large snapshot:
The snapshot extended attributes looks like:
- file: cephfs/.snap/snapshot.2017-02-24_22_17_01-1487992621
ceph.dir.entries="3"
ceph.dir.files="0"
ceph.dir.rbytes="30500769204664"
ceph.dir.rctime="1504695439.09966088000"
ceph.dir.rentries="7802785"
ceph.dir.rfiles="7758691"
ceph.dir.rsubdirs="44094"
ceph.dir.subdirs="3"
ops in flight during the deletion looks like:
{
"ops": [],
"num_ops": 0
}
The problem is that it takes almost 24 hours to delete a single snapshot and it puts the cluster into a warning state whenever it is happening.
Is there a quicker "backdoor" way to purge our snapshots without blowing up the cluster? We really want to clean it up and get it back to a more usable state. At the current rate, it will literally take almost 13 YEARS to clean up the snapshots. Our only other alternative at this point is to destroy the entire filesystem and re-create it and then restore all of the data that was on it (we already backed it up, which took over a week).
#8 Updated by Wyllys Ingersoll over 5 years ago
- File dentry_lru.txt View added
Here is a dump of the cephfs 'dentry_lru' table, in case it is interesting.
#9 Updated by Wyllys Ingersoll over 5 years ago
Note, the bug says "10.2.7" but we have since upgraded to 10.2.9 and the same problem exists.
#10 Updated by Zheng Yan over 5 years ago
what do you mean "it takes almost 24 hours to delete a single snapshot"? 'rmdir .snap/xxx' tooks 24 hours or pgs on trimsnap states for 24 hours?
#11 Updated by Wyllys Ingersoll over 5 years ago
The trimsnap states. The rmdir actually completes quickly, but the resulting operations throw the entire cluster into massive recovery storm that can takes days to recover from.
#12 Updated by Patrick Donnelly about 5 years ago
- Subject changed from too many cephfs snapshots chokes the system to cephfs: too many cephfs snapshots chokes the system
- Category changed from 89 to Correctness/Safety
- Priority changed from Normal to Urgent
- Target version changed from v10.2.10 to v13.0.0
- Source set to Community (user)
- Tags set to snaps
- Release deleted (
jewel) - Affected Versions deleted (
v10.2.7) - Component(FS) MDS added
Zheng, is this issue resolved with the snapshot changes for Mimic?
#13 Updated by Zheng Yan about 5 years ago
this is actually osd issue. I talk to josh at cephalocon. He said it has already been fixed
#14 Updated by Zheng Yan about 5 years ago
- Status changed from New to Closed