Bug #17731: MDS stuck in stopping with other rank's strays - CephFS - Ceph

Actions

Copy link

Bug #17731

closed

MDS stuck in stopping with other rank's strays

Added by John Spray over 7 years ago. Updated about 5 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

John Spray

Category:

Target version:

Ceph - v12.0.0

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

multimds

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Kraken v11.0.2

Seen on a max_mds=2 MDS cluster with a fuse client doing an rsync -av --delete on a dir that included hard links.

Running a backup job overnight with two active MDS daemons, then I set max_mds=1 and deactivated rank 1 (client still mounted).

Log and cache dump attached from mds.gravel1 which held rank 1. It got most of the way through stopping and then got stuck with 6 items in cache, all things in ~mds0.

The log indicates that we're somehow not making it as far as trim_dentry on those items, but I can't see why.

Tried flushing journals, killing and evicting client, but no progress. Interestingly, when I tried setting mds_cache_size to 100 on rank 0, it also wouldn't trim past 500-something entries, so there was something going wrong with the trimming there too.

Files

Download all files

gravel1.stuck_stopping_cache.gz (5.64 KB) gravel1.stuck_stopping_cache.gz		John Spray, 10/28/2016 10:32 AM
gravel1.stuck_in_stopping.log.gz (107 KB) gravel1.stuck_in_stopping.log.gz		John Spray, 10/28/2016 10:32 AM