Project

General

Profile

Actions

Bug #1243

closed

inest lock blocks dir create for a long time

Added by Greg Farnum almost 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From the mailing list:
steps are:
we mount ceph on /mnt/test/
then create dir "/mnt/test/a/b/"
1) in dir "b" , use "seq 3000|xargs -i mkdir {}" to create 3000 dirs
2) and at some time,make a directory "c" in "a"
from the mds debug log:

2011-06-29 05:44:19.368961 7f7a0b421700 mds0.locker wrlock_start
waiting on (inest lock->sync w=1 dirty flushing) on [inode 10000000000
[...2,head] /a/ auth v18 pv20 ap=312 f(v0 m2011-06-29 05:44:15.550665
2=0+2) n(v0 rc2011-06-29 05:44:15.550665 1934=0+1934) (iauth sync r=1)
(isnap sync r=1) (inest lock->sync w=1 dirty flushing) (ifile excl
w=1) (ixattr excl) (iversion lock) caps={4099=pAsLsXsxFsx/-@10},l=4099 | dirtyscattered lock dirfrag caps dirty authpin 0x14c97e0]

we find:
the dir "a" was locked when we create dirs below dir "b"
in function predirty_journal_parents (in MDCache.cc ), the flag "stop"
was marked true,so we got the message "predirty_journal_parents stop.
marking nestlock on".
step 1) got a lock of dir "a", it type is CEPH_LOCK_INEST , it name
is " sync "
and the value of this lock is "inest lock->sync w=1 dirty flushing".

I reproduced this locally on the kernel client but have been unable to get it on cfuse. The delay is on the MDS side though; it takes some 53 seconds after the request comes in for the reply to go out and the only recorded wait is on the inest lock while in the "lock->sync" state and flushing.

Caps problem? Order of wakeup problem? Flushing problem?

Actions #1

Updated by Greg Farnum almost 13 years ago

  • Status changed from New to Resolved

commit:310032ee8128f6417ac302c0f5ecd27c691cbcc7

I haven't been able to figure out why this doesn't impact cfuse, though -- I can see the scatter_writebehind cycling and the mkdir requests do block but they get picked up inside of a few tenths of a second. This despite the fact that just the 3000 mkdirs takes 2-3 longer on the kclient (also wtf).

Actions #2

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.31)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF