Project

General

Profile

Actions

Bug #21412

closed

cephfs: too many cephfs snapshots chokes the system

Added by Wyllys Ingersoll over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (user)
Tags:
snaps
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a cluster with /cephfs/.snap directory with over 4800 entries. Trying to delete older snapshots (some are over 6 months old on a pretty active file system) causes the "rmdir" command to hang, as well as any future operations on the .snap directory (such as 'ls'). Also, it is causing the number of blocked requests to grow indefinitely.

Ceph 10.2.7
Ubuntu 16.04.2
Kernel: 4.9.10


Files

ceph-mds.mds01.log.gz (557 KB) ceph-mds.mds01.log.gz Wyllys Ingersoll, 09/15/2017 09:52 PM
perf_dump.after.txt (5.73 KB) perf_dump.after.txt Wyllys Ingersoll, 10/09/2017 01:51 PM
dentry_lru.txt (347 KB) dentry_lru.txt cephfs dentry_lru during a snapshot deletion Wyllys Ingersoll, 10/09/2017 01:55 PM
Actions #1

Updated by Greg Farnum over 6 years ago

  • Project changed from Ceph to CephFS
  • Category changed from 129 to 89

Can you dump the ops in flight on both the MDS and the client issuing the snap rmdir when this happens? And the perfcounters on the MDS?

My blind guess about what's blocking this is actually not snapshot trimming, but if the queue for deleting inodes (or one of the directory fragments, as you're on Jewel) is at its max size.

Actions #2

Updated by Wyllys Ingersoll over 6 years ago

Greg Farnum wrote:

Can you dump the ops in flight on both the MDS and the client issuing the snap rmdir when this happens? And the perfcounters on the MDS?

My blind guess about what's blocking this is actually not snapshot trimming, but if the queue for deleting inodes (or one of the directory fragments, as you're on Jewel) is at its max size.

What command(s) should I use to capture that info?

Actions #3

Updated by Zheng Yan over 6 years ago

  • Assignee changed from Jos Collin to Zheng Yan
Actions #4

Updated by Zheng Yan over 6 years ago

ceph-mds.mds01.log.gz does not include useful information. The log was generated when mds replays log. Maybe the hang was caused by mds crash. does restarting mds resolve the hang?

Actions #5

Updated by Greg Farnum over 6 years ago

ceph daemon mds.<name> dump_ops_in_flight
ceph daemon mds.<name> perf dump

Actions #6

Updated by Wyllys Ingersoll over 6 years ago

Thanks. Im hesitant to trigger the issue again, last time it threw my cluster into major chaos that took several days to recover. Once I get data off of it, I will trigger the issue again and capture the info that you need.

Actions #7

Updated by Wyllys Ingersoll over 6 years ago

Here is data collected from a recent attempt to delete a very old and very large snapshot:

The snapshot extended attributes looks like:

  1. file: cephfs/.snap/snapshot.2017-02-24_22_17_01-1487992621
    ceph.dir.entries="3"
    ceph.dir.files="0"
    ceph.dir.rbytes="30500769204664"
    ceph.dir.rctime="1504695439.09966088000"
    ceph.dir.rentries="7802785"
    ceph.dir.rfiles="7758691"
    ceph.dir.rsubdirs="44094"
    ceph.dir.subdirs="3"

ops in flight during the deletion looks like: {
"ops": [],
"num_ops": 0
}

The problem is that it takes almost 24 hours to delete a single snapshot and it puts the cluster into a warning state whenever it is happening.

Is there a quicker "backdoor" way to purge our snapshots without blowing up the cluster? We really want to clean it up and get it back to a more usable state. At the current rate, it will literally take almost 13 YEARS to clean up the snapshots. Our only other alternative at this point is to destroy the entire filesystem and re-create it and then restore all of the data that was on it (we already backed it up, which took over a week).

Actions #8

Updated by Wyllys Ingersoll over 6 years ago

Here is a dump of the cephfs 'dentry_lru' table, in case it is interesting.

Actions #9

Updated by Wyllys Ingersoll over 6 years ago

Note, the bug says "10.2.7" but we have since upgraded to 10.2.9 and the same problem exists.

Actions #10

Updated by Zheng Yan over 6 years ago

what do you mean "it takes almost 24 hours to delete a single snapshot"? 'rmdir .snap/xxx' tooks 24 hours or pgs on trimsnap states for 24 hours?

Actions #11

Updated by Wyllys Ingersoll over 6 years ago

The trimsnap states. The rmdir actually completes quickly, but the resulting operations throw the entire cluster into massive recovery storm that can takes days to recover from.

Actions #12

Updated by Patrick Donnelly about 6 years ago

  • Subject changed from too many cephfs snapshots chokes the system to cephfs: too many cephfs snapshots chokes the system
  • Category changed from 89 to Correctness/Safety
  • Priority changed from Normal to Urgent
  • Target version changed from v10.2.10 to v13.0.0
  • Source set to Community (user)
  • Tags set to snaps
  • Release deleted (jewel)
  • Affected Versions deleted (v10.2.7)
  • Component(FS) MDS added

Zheng, is this issue resolved with the snapshot changes for Mimic?

Actions #13

Updated by Zheng Yan about 6 years ago

this is actually osd issue. I talk to josh at cephalocon. He said it has already been fixed

Actions #14

Updated by Zheng Yan about 6 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF