Project

General

Profile

Actions

Bug #10744

closed

MDS gets stuck in 'stopping' when strays exist

Added by John Spray about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The migrate_stray part happens, but we end up stuck like this:

2015-02-04 11:12:46.697926 7f40e4de4700  7 mds.1.cache there's still stuff in the cache: 2
2015-02-04 11:12:46.697928 7f40e4de4700  7 mds.1.cache show_cache
2015-02-04 11:12:46.697931 7f40e4de4700  7 mds.1.cache   dirfrag [dir 60b ~mds1/stray1/ [2,head] auth{0=1} v=11 cv=11/11 state=1073741826|complete f(v0 m2015-02-04 11:06:01.896388) n(v0 rc2015-02-04 11:06:01.896388) hs=0+1,ss=0+0 | child=1 sticky=0 replicated=1 dirty=0 authpin=0 0x5e00770]  
2015-02-04 11:12:46.697946 7f40e4de4700  7 mds.1.cache    dentry [dentry #101/stray1/10000000001 [2,head] auth{0=1} NULL (dversion lock) v=10 inode=0 | request=0 lock=0 inodepin=0 replicated=1 dirty=0 waiter=0 authpin=0 0x5e09860]
2015-02-04 11:12:46.697967 7f40e4de4700  7 mds.1.cache  unlinked [inode 100 [...2,head] ~mds0/ rep@0.1 v2 snaprealm=0x5c80d80 f(v0 11=1+10) n(v1 rc2015-02-04 11:06:01.910230 13=1+12)/n(v0 12=1+11) (inest mix) (iversion lock) 0x5de0c28]
2015-02-04 11:12:46.697986 7f40e4de4700  7 mds.1.cache  unlinked [inode 101 [...2,head] ~mds1/ auth{0=1} v3 snaprealm=0x5c81d40 f(v0 11=1+10) n(v1 rc2015-02-04 11:06:01.896388 b119407 13=2+11)/n(v0 12=1+11) (inest mix) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 dirtyparent=0 replicated=1 dirty=0 authpin=0 0x5dd8000]
2015-02-04 11:12:46.698002 7f40e4de4700  7 mds.1.cache   dirfrag [dir 101 ~mds1/ [2,head] auth{0=1} v=7 cv=7/7 dir_auth=-2 state=1073741824 f(v0 11=1+10) n(v1 rc2015-02-04 11:06:01.896388 11=1+10)/n(v1 rc2015-02-04 11:06:01.896388 b119407 12=2+10) hs=1+0,ss=0+0 | child=1 subtree=0 replicated=1 dirty=0 authpin=0 0x5e00000]
2015-02-04 11:12:46.698016 7f40e4de4700  7 mds.1.cache    dentry [dentry #101/stray1 [2,head] auth{0=1} (dversion lock) v=6 inode=0x5dd92b0 | inodepin=1 replicated=1 dirty=0 0x5e081e0]
2015-02-04 11:12:46.698024 7f40e4de4700  7 mds.1.cache     inode [inode 60b [...2,head] ~mds1/stray1/ auth{0=1} v6 f(v0 m2015-02-04 11:06:01.896388) n(v0 rc2015-02-04 11:06:01.896388 1=0+1) (inest mix) (iversion lock) | lock=0 dirfrag=1 stickydirs=0 stray=0 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=0 0x5dd92b0]  
2015-02-04 11:12:46.698040 7f40e4de4700  7 mds.1.1 shutdown_pass=false

Actions #1

Updated by Zheng Yan about 9 years ago

looks like you are running multiple MDS. When a MDS gets stuck in 'stopping' state, restart the whole MDS cluster, then try stoping the same MDS again. The stop MDS process should success eventually.

Actions #2

Updated by John Spray about 9 years ago

Yes, I know -- multiple MDSs is what I'm testing here :-)

Actions #3

Updated by Greg Farnum about 9 years ago

Is it possible your hack to avoid purging anything is busting more than you realize?

Actions #4

Updated by John Spray about 9 years ago

  • Status changed from New to In Progress
  • Assignee set to John Spray

have a working patch for this, PR in due course...

Actions #5

Updated by John Spray about 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by John Spray about 9 years ago

  • Status changed from Fix Under Review to Resolved
commit c0e6227519e8a867ea1296b4399415b5e8ab9509
Author: John Spray <john.spray@redhat.com>
Date:   Tue Mar 10 14:40:30 2015 +0000

    mds: give up replicas of a stopping mds's stuff

    In order for an MDS to make it through stopping when
    it had some strays, the other ranks that serviced
    the migrate_stray renames must ensure that they
    give up any cache objects that belonged to
    the stopping MDS, so that the stopping MDS
    can finish emptying its cache.

    Fixes: #10744
    Signed-off-by: John Spray <john.spray@redhat.com>
Actions

Also available in: Atom PDF