Project

General

Profile

Actions

Bug #1538

closed

mds: all clients can and up becoming unresponsive, mds locker waiting for unfreeze

Added by Brandon Seibel over 12 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Cluster mds config as such: mds e72: 2/2/2 up {0=mds02=up:active,1=mds01=up:active}, 2 up:standby-replay

Tried to force the condition to occur again by rsync --inplace (norenames) into client02, while running through the same directory tree in client01 running md5sums on all files.
After an hour or so both the md5sums and rsyncs started blocking.

While this was happening all that was occurring on mds1 was :

2011-09-12 17:20:10.507592 7f5514360700 mds1.locker scatter_tick
2011-09-12 17:20:10.507641 7f5514360700 mds1.locker scatter_nudge waiting for unfreeze on [inode 20000007334 [...2,head] /xxx.com/www/htdocs/files/comp-photos/stills_white_0640/ auth{0=1} v11144 ap=0+1 f(v0 m2011-09-12 15:48:40.702360 117=0+117) n(v219 rc2011-09-12 16:32:58.682720 b1040975835 6965=6847+118) (inest mix dirty) (iversion lock) caps={4122=pAsLsXsFs/-@1,4163=pAsLsXsFs/-@2} | ptrwaiter dirtyscattered dirfrag caps replicated dirty 0xfe00d20]
2011-09-12 17:20:10.507663 7f5514360700 mds1.locker scatter_nudge waiting for unfreeze on [inode 10000000006 [...2,head] /xxx.com/www/htdocs/files/ auth{0=1} v67647 ap=0+1 f(v1 m2011-09-12 14:44:45.580215 4342=4331+11) n(v1095 rc2011-09-12 16:32:58.682720 b2010910079 49259=48532+727) (inest mix dirty) (iversion lock) caps={4122=pAsLsXsFs/-@1,4163=pAsLsXsFs/-@1} | ptrwaiter dirtyscattered dirfrag caps replicated dirty 0x6c2e140]
2011-09-12 17:20:10.507700 7f5514360700 mds1.locker scatter_nudge waiting for unfreeze on [inode 10000000002 [...2,head] /xxx.com/www/htdocs/ auth{0=1} v16660 ap=0+1 f(v1 m2011-09-12 11:01:39.000016 25=12+13) n(v1158 rc2011-09-12 16:32:58.682720 b2087649934 54786=53661+1125) (inest mix dirty) (iversion lock) caps={4122=pAsLsXsFs/-@1,4163=pAsLsXsFs/-@4} | ptrwaiter dirtyscattered dirfrag caps replicated dirty 0xfd9b460]
2011-09-12 17:20:10.507721 7f5514360700 mds1.locker scatter_nudge waiting for unfreeze on [inode 20000007330 [...2,head] /xxx.com/www/htdocs/files/comp-photos/ auth{0=1} v92743 ap=0+1 f(v0 m2011-09-12 14:44:48.726817 6=3+3) n(v528 rc2011-09-12 16:32:58.682720 b1379294006 20373=20030+343) (inest mix dirty) (iversion lock) caps={4122=pAsLsXsFs/-@1,4163=pAsLsXsFs/-@1} | ptrwaiter dirtyscattered dirfrag caps replicated dirty 0xfdfeba0]
2011-09-12 17:20:10.507739 7f5514360700 mds1.locker scatter_nudge waiting for unfreeze on [inode 10000000001 [...2,head] /xxx.com/www/ auth{0=1} v15172 ap=0+1 f(v1 m2011-09-12 11:01:38.811432 2=0+2) n(v1119 rc2011-09-12 16:36:39.869162 b2088488587 54828=53701+1127) (inest mix dirty) (iversion lock) caps={4122=pAsLsXsFs/-@1,4163=pAsLsXsFs/-@1} | ptrwaiter dirtyscattered dirfrag caps replicated dirty 0x13c6920]

along with the migrator that couldn't do anything due to:
2011-09-12 17:20:27.858671 7f5515463700 mds1.migrator can't export, freezing|frozen. wait for other exports to finish first.

So looks like somewhere part of the tree got frozen and then never unfrozen.

Can retrieve logs here:
http://evul.net/~xnevious/mds01.shortened.log.bz2
http://evul.net/~xnevious/mds02.shortend.log.bz2
http://evul.net/~xnevious/client01.shortened.log.bz2


Files

docapupdate-deadlock.patch (1.2 KB) docapupdate-deadlock.patch Brandon Seibel, 09/29/2011 11:15 AM
Actions

Also available in: Atom PDF