Project

General

Profile

Actions

Tasks #973

closed

Ceph - Bug #910: Multi-MDS Ceph does not pass fsstress

Dir failing to freeze

Added by Greg Farnum about 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

Dir 10000000074 has got an auth_pin that won't go away. I think the problem is here:

2011-03-31 22:03:08.134371 7ffebcbd6710 mds1.cache.ino(10000000075) auth_pin by 0xf74680 on [inode 10000000075 [...2,head] /p1/d1/da/dd/de/ auth{0=1} v667 ap=1 na=1 f(v0 m2011-03-31 22:01:31.
435045 8=5+3) n(v16 rc2011-03-31 22:02:47.468250 b6143864 a1 44=34+10) (inest mix) (ifile excl) (iversion lock) caps={4104=pAsLsXsFsx/-@1},l=4104 | dirfrag caps replicated dirty authpin 0xe14
0d0] now 1+0
2011-03-31 22:03:08.134384 7ffebcbd6710 mds1.cache.dir(10000000074) adjust_nested_auth_pins 1/1 on [dir 10000000074 /p1/d1/da/dd/ [2,head] auth{0=1} v=674 cv=0/0 dir_auth=1 ap=0+1+1 na=6 stat
e=1610612738|complete f(v0 m2011-03-31 22:02:46.635923 18=13+5) n(v36 rc2011-03-31 22:02:47.468250 b18344518 a6 80=64+16) hs=18+10,ss=0+0 dirty=24 | child subtree replicated dirty 0xd0b2f8] c
ount now 0 + 1
2011-03-31 22:03:08.134402 7ffebcbd6710 mds1.locker  auth_pinning [inode 1000000013a [...2,head] /p1/d1/da/dd/de/d48/ auth{0=1} v330 na=1 f(v0 m2011-03-31 22:02:47.468250 8=6+2) n(v4 rc2011-03-31 22:02:47.468250 b256795 a1 18=15+3) (inest mix) (ifile excl) (ixattr excl) (iversion lock) caps={4104=pAsLsXsxFsx/-@2},l=4104 | dirfrag caps replicated dirty 0xe17430]
2011-03-31 22:03:08.134417 7ffebcbd6710 mds1.cache.ino(1000000013a) auth_pin by 0xf74680 on [inode 1000000013a [...2,head] /p1/d1/da/dd/de/d48/ auth{0=1} v330 ap=1 na=1 f(v0 m2011-03-31 22:02:47.468250 8=6+2) n(v4 rc2011-03-31 22:02:47.468250 b256795 a1 18=15+3) (inest mix) (ifile excl) (ixattr excl) (iversion lock) caps={4104=pAsLsXsxFsx/-@2},l=4104 | dirfrag caps replicated dirty authpin 0xe17430] now 1+0
2011-03-31 22:03:08.134431 7ffebcbd6710 mds1.cache.dir(10000000075) adjust_nested_auth_pins 1/1 on [dir 10000000075 /p1/d1/da/dd/de/ [2,head] auth{0=1} v=342 cv=0/0 ap=0+1+1 na=1 state=1610612738|complete f(v0 m2011-03-31 22:01:31.435045 8=5+3) n(v16 rc2011-03-31 22:02:47.468250 b6143864 a1 43=34+9) hs=8+14,ss=0+0 dirty=18 | child replicated dirty 0xd0bf28] count now 0 + 1
2011-03-31 22:03:08.134443 7ffebcbd6710 mds1.cache.dir(10000000074) adjust_nested_auth_pins 1/0 on [dir 10000000074 /p1/d1/da/dd/ [2,head] auth{0=1} v=674 cv=0/0 dir_auth=1 ap=0+1+2 na=6 state=1610612738|complete f(v0 m2011-03-31 22:02:46.635923 18=13+5) n(v36 rc2011-03-31 22:02:47.468250 b18344518 a6 80=64+16) hs=18+10,ss=0+0 dirty=24 | child subtree replicated dirty 0xd0b2f8] count now 0 + 2

As far as I can tell, when unlocking it only removes one of those:
2011-03-31 22:03:08.306566 7ffebcbd6710 mds1.cache.dir(10000000074) auth_unpin by 0xf74680 on [dir 10000000074 /p1/d1/da/dd/ [2,head] auth{0=1} v=676 cv=0/0 dir_auth=1 ap=0+1+2 na=6 state=1610612738|complete f(v0 m2011-03-31 22:02:46.635923 18=13+5) n(v36 rc2011-03-31 22:03:08.134997 b18344887 a6 81=65+16)/n(v36 rc2011-03-31 22:02:47.468250 b18344518 a6 80=64+16) hs=18+10,ss=0+0 dirty=24 | child subtree replicated dirty 0xd0b2f8] count now 0 + 2
2011-03-31 22:03:08.306597 7ffebcbd6710 mds1.cache.dir(10000000075) auth_unpin by 0xf74680 on [dir 10000000075 /p1/d1/da/dd/de/ [2,head] auth{0=1} pv=346 v=344 cv=0/0 ap=1+3+5 na=1 state=1610612738|complete f(v0 m2011-03-31 22:01:31.435045 8=5+3)->f(v0 m2011-03-31 22:01:31.435045 8=5+3) n(v16 rc2011-03-31 22:03:08.134997 b6144233 a1 44=35+9)->n(v16 rc2011-03-31 22:03:08.162375 b4238666 a1 36=29+7)/n(v16 rc2011-03-31 22:03:08.134997 b6144233 a1 44=35+9) hs=8+14,ss=0+0 dirty=18 | child replicated dirty authpin 0xd0bf28] count now 1 + 5
2011-03-31 22:03:08.306619 7ffebcbd6710 mds1.cache.dir(1000000013a) auth_unpin by 0xf74680 on [dir 1000000013a /p1/d1/da/dd/de/d48/ [2,head] auth{0=1} v=71 cv=0/0 ap=0+1+2 na=1 state=1610612738|complete f(v0 m2011-03-31 22:02:47.468250 8=6+2) n(v4 rc2011-03-31 22:03:08.134997 b257164 a1 18=16+2) hs=8+7,ss=0+0 dirty=10 | child replicated dirty 0xd0cb58] count now 0 + 2
2011-03-31 22:03:08.306646 7ffebcbd6710 mds1.cache.ino(10000000075) auth_unpin by 0xf74680 on [inode 10000000075 [...2,head] /p1/d1/da/dd/de/ auth{0=1} v675 na=1 f(v0 m2011-03-31 22:01:31.435045 8=5+3) n(v16 rc2011-03-31 22:03:08.134997 b6144233 a1 45=35+10) (inest mix w=1 dirty) (ifile excl) (iversion lock) caps={4104=pAsLsXsFsx/-@1},l=4104 | dirtyscattered lock dirfrag caps replicated dirty 0xe140d0] now 0+1
2011-03-31 22:03:08.306671 7ffebcbd6710 mds1.cache.dir(10000000074) adjust_nested_auth_pins -1/-1 on [dir 10000000074 /p1/d1/da/dd/ [2,head] auth{0=1} v=676 cv=0/0 dir_auth=1 ap=0+0+1 na=6 state=1610612738|complete f(v0 m2011-03-31 22:02:46.635923 18=13+5) n(v36 rc2011-03-31 22:03:08.134997 b18344887 a6 81=65+16)/n(v36 rc2011-03-31 22:02:47.468250 b18344518 a6 80=64+16) hs=18+10,ss=0+0 dirty=24 | child subtree replicated dirty 0xd0b2f8] count now 0 + 1

(That auth_unpin by 0xf74680 does match an auth_pin, not shown here.)
There are a bunch more adjust_nested_auth_pins on dir 10000000075, but none of them propagate up to 10000000074. Hmmm...

Actions

Also available in: Atom PDF