Bug #21412: cephfs: too many cephfs snapshots chokes the system - CephFS - Ceph

Actions

Copy link

Bug #21412

closed

cephfs: too many cephfs snapshots chokes the system

Added by Wyllys Ingersoll over 6 years ago. Updated about 6 years ago.

Status:

Closed

Priority:

Urgent

Assignee:

Zheng Yan

Category:

Correctness/Safety

Target version:

Ceph - v13.0.0

% Done:

Source:

Community (user)

Tags:

snaps

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We have a cluster with /cephfs/.snap directory with over 4800 entries. Trying to delete older snapshots (some are over 6 months old on a pretty active file system) causes the "rmdir" command to hang, as well as any future operations on the .snap directory (such as 'ls'). Also, it is causing the number of blocked requests to grow indefinitely.

Ceph 10.2.7
Ubuntu 16.04.2
Kernel: 4.9.10

Files

Download all files

ceph-mds.mds01.log.gz (557 KB) ceph-mds.mds01.log.gz		Wyllys Ingersoll, 09/15/2017 09:52 PM
perf_dump.after.txt (5.73 KB) perf_dump.after.txt		Wyllys Ingersoll, 10/09/2017 01:51 PM
dentry_lru.txt (347 KB) dentry_lru.txt	cephfs dentry_lru during a snapshot deletion	Wyllys Ingersoll, 10/09/2017 01:55 PM

Actions

Copy link

Updated by Greg Farnum over 6 years ago

Project changed from Ceph to CephFS
Category changed from 129 to 89

Can you dump the ops in flight on both the MDS and the client issuing the snap rmdir when this happens? And the perfcounters on the MDS?

My blind guess about what's blocking this is actually not snapshot trimming, but if the queue for deleting inodes (or one of the directory fragments, as you're on Jewel) is at its max size.

Actions

Copy link

Updated by Wyllys Ingersoll over 6 years ago

Greg Farnum wrote:

Can you dump the ops in flight on both the MDS and the client issuing the snap rmdir when this happens? And the perfcounters on the MDS?

My blind guess about what's blocking this is actually not snapshot trimming, but if the queue for deleting inodes (or one of the directory fragments, as you're on Jewel) is at its max size.

What command(s) should I use to capture that info?

Actions

Copy link

Updated by Zheng Yan over 6 years ago

Assignee changed from Jos Collin to Zheng Yan

Actions

Copy link

Updated by Zheng Yan over 6 years ago

ceph-mds.mds01.log.gz does not include useful information. The log was generated when mds replays log. Maybe the hang was caused by mds crash. does restarting mds resolve the hang?

Actions

Copy link

Updated by Greg Farnum over 6 years ago

ceph daemon mds.<name> dump_ops_in_flight
ceph daemon mds.<name> perf dump

Actions

Copy link

Updated by Wyllys Ingersoll over 6 years ago

Thanks. Im hesitant to trigger the issue again, last time it threw my cluster into major chaos that took several days to recover. Once I get data off of it, I will trigger the issue again and capture the info that you need.

Actions

Copy link

Updated by Wyllys Ingersoll over 6 years ago

File perf_dump.after.txt perf_dump.after.txt added

Here is data collected from a recent attempt to delete a very old and very large snapshot:

The snapshot extended attributes looks like:

file: cephfs/.snap/snapshot.2017-02-24_22_17_01-1487992621
ceph.dir.entries="3"
ceph.dir.files="0"
ceph.dir.rbytes="30500769204664"
ceph.dir.rctime="1504695439.09966088000"
ceph.dir.rentries="7802785"
ceph.dir.rfiles="7758691"
ceph.dir.rsubdirs="44094"
ceph.dir.subdirs="3"

ops in flight during the deletion looks like: {
"ops": [],
"num_ops": 0
}

The problem is that it takes almost 24 hours to delete a single snapshot and it puts the cluster into a warning state whenever it is happening.

Is there a quicker "backdoor" way to purge our snapshots without blowing up the cluster? We really want to clean it up and get it back to a more usable state. At the current rate, it will literally take almost 13 YEARS to clean up the snapshots. Our only other alternative at this point is to destroy the entire filesystem and re-create it and then restore all of the data that was on it (we already backed it up, which took over a week).

Actions

Copy link

Updated by Wyllys Ingersoll over 6 years ago

File dentry_lru.txt dentry_lru.txt added

Here is a dump of the cephfs 'dentry_lru' table, in case it is interesting.

Actions

Copy link

Updated by Wyllys Ingersoll over 6 years ago

Note, the bug says "10.2.7" but we have since upgraded to 10.2.9 and the same problem exists.

Actions

Copy link

#10

Updated by Zheng Yan over 6 years ago

what do you mean "it takes almost 24 hours to delete a single snapshot"? 'rmdir .snap/xxx' tooks 24 hours or pgs on trimsnap states for 24 hours?

Actions

Copy link

#11

Updated by Wyllys Ingersoll over 6 years ago

The trimsnap states. The rmdir actually completes quickly, but the resulting operations throw the entire cluster into massive recovery storm that can takes days to recover from.

Actions

Copy link

#12

Updated by Patrick Donnelly about 6 years ago

Subject changed from too many cephfs snapshots chokes the system to cephfs: too many cephfs snapshots chokes the system
Category changed from 89 to Correctness/Safety
Priority changed from Normal to Urgent
Target version changed from v10.2.10 to v13.0.0
Source set to Community (user)
Tags set to snaps
Release deleted (~~jewel~~)
Affected Versions deleted (~~v10.2.7~~)
Component(FS) MDS added

Zheng, is this issue resolved with the snapshot changes for Mimic?

Actions

Copy link

#13

Updated by Zheng Yan about 6 years ago

this is actually osd issue. I talk to josh at cephalocon. He said it has already been fixed

Actions

Copy link

#14

Updated by Zheng Yan about 6 years ago

Status changed from New to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #21412

cephfs: too many cephfs snapshots chokes the system

Updated by Greg Farnum over 6 years ago

Updated by Wyllys Ingersoll over 6 years ago

Updated by Zheng Yan over 6 years ago

Updated by Zheng Yan over 6 years ago

Updated by Greg Farnum over 6 years ago

Updated by Wyllys Ingersoll over 6 years ago

Updated by Wyllys Ingersoll over 6 years ago

Updated by Wyllys Ingersoll over 6 years ago

Updated by Wyllys Ingersoll over 6 years ago

Updated by Zheng Yan over 6 years ago

Updated by Wyllys Ingersoll over 6 years ago

Updated by Patrick Donnelly about 6 years ago

Updated by Zheng Yan about 6 years ago

Updated by Zheng Yan about 6 years ago