Project

General

Profile

Actions

Tasks #973

closed

Ceph - Bug #910: Multi-MDS Ceph does not pass fsstress

Dir failing to freeze

Added by Greg Farnum about 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

Dir 10000000074 has got an auth_pin that won't go away. I think the problem is here:

2011-03-31 22:03:08.134371 7ffebcbd6710 mds1.cache.ino(10000000075) auth_pin by 0xf74680 on [inode 10000000075 [...2,head] /p1/d1/da/dd/de/ auth{0=1} v667 ap=1 na=1 f(v0 m2011-03-31 22:01:31.
435045 8=5+3) n(v16 rc2011-03-31 22:02:47.468250 b6143864 a1 44=34+10) (inest mix) (ifile excl) (iversion lock) caps={4104=pAsLsXsFsx/-@1},l=4104 | dirfrag caps replicated dirty authpin 0xe14
0d0] now 1+0
2011-03-31 22:03:08.134384 7ffebcbd6710 mds1.cache.dir(10000000074) adjust_nested_auth_pins 1/1 on [dir 10000000074 /p1/d1/da/dd/ [2,head] auth{0=1} v=674 cv=0/0 dir_auth=1 ap=0+1+1 na=6 stat
e=1610612738|complete f(v0 m2011-03-31 22:02:46.635923 18=13+5) n(v36 rc2011-03-31 22:02:47.468250 b18344518 a6 80=64+16) hs=18+10,ss=0+0 dirty=24 | child subtree replicated dirty 0xd0b2f8] c
ount now 0 + 1
2011-03-31 22:03:08.134402 7ffebcbd6710 mds1.locker  auth_pinning [inode 1000000013a [...2,head] /p1/d1/da/dd/de/d48/ auth{0=1} v330 na=1 f(v0 m2011-03-31 22:02:47.468250 8=6+2) n(v4 rc2011-03-31 22:02:47.468250 b256795 a1 18=15+3) (inest mix) (ifile excl) (ixattr excl) (iversion lock) caps={4104=pAsLsXsxFsx/-@2},l=4104 | dirfrag caps replicated dirty 0xe17430]
2011-03-31 22:03:08.134417 7ffebcbd6710 mds1.cache.ino(1000000013a) auth_pin by 0xf74680 on [inode 1000000013a [...2,head] /p1/d1/da/dd/de/d48/ auth{0=1} v330 ap=1 na=1 f(v0 m2011-03-31 22:02:47.468250 8=6+2) n(v4 rc2011-03-31 22:02:47.468250 b256795 a1 18=15+3) (inest mix) (ifile excl) (ixattr excl) (iversion lock) caps={4104=pAsLsXsxFsx/-@2},l=4104 | dirfrag caps replicated dirty authpin 0xe17430] now 1+0
2011-03-31 22:03:08.134431 7ffebcbd6710 mds1.cache.dir(10000000075) adjust_nested_auth_pins 1/1 on [dir 10000000075 /p1/d1/da/dd/de/ [2,head] auth{0=1} v=342 cv=0/0 ap=0+1+1 na=1 state=1610612738|complete f(v0 m2011-03-31 22:01:31.435045 8=5+3) n(v16 rc2011-03-31 22:02:47.468250 b6143864 a1 43=34+9) hs=8+14,ss=0+0 dirty=18 | child replicated dirty 0xd0bf28] count now 0 + 1
2011-03-31 22:03:08.134443 7ffebcbd6710 mds1.cache.dir(10000000074) adjust_nested_auth_pins 1/0 on [dir 10000000074 /p1/d1/da/dd/ [2,head] auth{0=1} v=674 cv=0/0 dir_auth=1 ap=0+1+2 na=6 state=1610612738|complete f(v0 m2011-03-31 22:02:46.635923 18=13+5) n(v36 rc2011-03-31 22:02:47.468250 b18344518 a6 80=64+16) hs=18+10,ss=0+0 dirty=24 | child subtree replicated dirty 0xd0b2f8] count now 0 + 2

As far as I can tell, when unlocking it only removes one of those:
2011-03-31 22:03:08.306566 7ffebcbd6710 mds1.cache.dir(10000000074) auth_unpin by 0xf74680 on [dir 10000000074 /p1/d1/da/dd/ [2,head] auth{0=1} v=676 cv=0/0 dir_auth=1 ap=0+1+2 na=6 state=1610612738|complete f(v0 m2011-03-31 22:02:46.635923 18=13+5) n(v36 rc2011-03-31 22:03:08.134997 b18344887 a6 81=65+16)/n(v36 rc2011-03-31 22:02:47.468250 b18344518 a6 80=64+16) hs=18+10,ss=0+0 dirty=24 | child subtree replicated dirty 0xd0b2f8] count now 0 + 2
2011-03-31 22:03:08.306597 7ffebcbd6710 mds1.cache.dir(10000000075) auth_unpin by 0xf74680 on [dir 10000000075 /p1/d1/da/dd/de/ [2,head] auth{0=1} pv=346 v=344 cv=0/0 ap=1+3+5 na=1 state=1610612738|complete f(v0 m2011-03-31 22:01:31.435045 8=5+3)->f(v0 m2011-03-31 22:01:31.435045 8=5+3) n(v16 rc2011-03-31 22:03:08.134997 b6144233 a1 44=35+9)->n(v16 rc2011-03-31 22:03:08.162375 b4238666 a1 36=29+7)/n(v16 rc2011-03-31 22:03:08.134997 b6144233 a1 44=35+9) hs=8+14,ss=0+0 dirty=18 | child replicated dirty authpin 0xd0bf28] count now 1 + 5
2011-03-31 22:03:08.306619 7ffebcbd6710 mds1.cache.dir(1000000013a) auth_unpin by 0xf74680 on [dir 1000000013a /p1/d1/da/dd/de/d48/ [2,head] auth{0=1} v=71 cv=0/0 ap=0+1+2 na=1 state=1610612738|complete f(v0 m2011-03-31 22:02:47.468250 8=6+2) n(v4 rc2011-03-31 22:03:08.134997 b257164 a1 18=16+2) hs=8+7,ss=0+0 dirty=10 | child replicated dirty 0xd0cb58] count now 0 + 2
2011-03-31 22:03:08.306646 7ffebcbd6710 mds1.cache.ino(10000000075) auth_unpin by 0xf74680 on [inode 10000000075 [...2,head] /p1/d1/da/dd/de/ auth{0=1} v675 na=1 f(v0 m2011-03-31 22:01:31.435045 8=5+3) n(v16 rc2011-03-31 22:03:08.134997 b6144233 a1 45=35+10) (inest mix w=1 dirty) (ifile excl) (iversion lock) caps={4104=pAsLsXsFsx/-@1},l=4104 | dirtyscattered lock dirfrag caps replicated dirty 0xe140d0] now 0+1
2011-03-31 22:03:08.306671 7ffebcbd6710 mds1.cache.dir(10000000074) adjust_nested_auth_pins -1/-1 on [dir 10000000074 /p1/d1/da/dd/ [2,head] auth{0=1} v=676 cv=0/0 dir_auth=1 ap=0+0+1 na=6 state=1610612738|complete f(v0 m2011-03-31 22:02:46.635923 18=13+5) n(v36 rc2011-03-31 22:03:08.134997 b18344887 a6 81=65+16)/n(v36 rc2011-03-31 22:02:47.468250 b18344518 a6 80=64+16) hs=18+10,ss=0+0 dirty=24 | child subtree replicated dirty 0xd0b2f8] count now 0 + 1

(That auth_unpin by 0xf74680 does match an auth_pin, not shown here.)
There are a bunch more adjust_nested_auth_pins on dir 10000000075, but none of them propagate up to 10000000074. Hmmm...

Actions #1

Updated by Greg Farnum about 13 years ago

Oh, suppose I should update. It's removing the direct inode pin from 10000000075 but not the one from the dir. That's because dir 10000000075 still has auth pins attached to it, because another directory (1000000007e) has a stuck auth_pin. It looks like 1000000007e is the actual source of the problem due to an issue with dentries getting auth_pinned by locks, and one (or more?) isn't auth_unpinning.

Actions #2

Updated by Greg Farnum about 13 years ago

I am reasonably certain the problem is this, from Locker::simple_xlock:

  if (lock->is_stable())
    lock->get_parent()->auth_pin(lock);

The parent in question here is a dentry, for an inode that gets exported as part of a slave_rename. It looks like the dentry's auth_pin never gets cleaned up properly and so the parent dir holds onto a lost auth_pin.

Actions #3

Updated by Greg Farnum about 13 years ago

Hmm, the locking surrounding this is definitely suspect. I'll try and hack something together, but simple_xlock code auth_pins based on whether the lock is stable or not. The cleanup code (xlock_finish and eval_gather; in my checkout lines 595 & 1152) uses a variety of metrics to decide whether or not to auth_unpin, but to my naive eye they have no way of knowing whether those metrics match up to a lock state that was previously stable. Can you comment on this, Sage?

Actions #4

Updated by Sage Weil about 13 years ago

This is because XLOCKDONE is unstable, and LOCK, EXCL, etc. are stable. So, on xlock, we only add a new auth_pin if we're not already unstable. On dropping the xlock, we unpin if we move from XLOCKDONE (unstable) to EXCL or LOCK. If we stay in XLOCKDONE, we don't unpin, because we're still unstable. It looks ot me like the invariant auth_pin IFF unstable is preserved (at least in the auth case, i didn't look closely at the slave/replica case).

That doesn't explain the discrepancy, though.. do you have a larger log segment surrounding the unlock?

Actions #5

Updated by Greg Farnum about 13 years ago

As I said, the inode in question was migrated, so the xlock cleanup isn't going through the normal unlock paths, that's why it doesn't match up. Your explanation helps, thanks!

Actions #6

Updated by Greg Farnum about 13 years ago

  • Status changed from In Progress to 7

Think we got this but not sure...testing will help tell!

Actions #7

Updated by Greg Farnum about 13 years ago

  • Status changed from 7 to Resolved

Appears to be working.

Actions #8

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.28)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF