Project

General

Profile

Bug #17731

MDS stuck in stopping with other rank's strays

Added by John Spray almost 3 years ago. Updated 6 months ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
10/28/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:

Description

Kraken v11.0.2

Seen on a max_mds=2 MDS cluster with a fuse client doing an rsync -av --delete on a dir that included hard links.

Running a backup job overnight with two active MDS daemons, then I set max_mds=1 and deactivated rank 1 (client still mounted).

Log and cache dump attached from mds.gravel1 which held rank 1. It got most of the way through stopping and then got stuck with 6 items in cache, all things in ~mds0.

The log indicates that we're somehow not making it as far as trim_dentry on those items, but I can't see why.

Tried flushing journals, killing and evicting client, but no progress. Interestingly, when I tried setting mds_cache_size to 100 on rank 0, it also wouldn't trim past 500-something entries, so there was something going wrong with the trimming there too.

gravel1.stuck_stopping_cache.gz (5.64 KB) John Spray, 10/28/2016 10:32 AM

gravel1.stuck_in_stopping.log.gz (107 KB) John Spray, 10/28/2016 10:32 AM

History

#1 Updated by John Spray almost 3 years ago

  • Priority changed from Normal to High
  • Target version set to v12.0.0

#2 Updated by John Spray almost 3 years ago

  • Assignee set to John Spray

#3 Updated by John Spray about 2 years ago

  • Status changed from New to Can't reproduce

This code has all changed a lot since.

#4 Updated by Patrick Donnelly 6 months ago

  • Category deleted (90)
  • Labels (FS) multimds added

Also available in: Atom PDF