Bug #15508: client: simultaneous readdirs are very racy - CephFS - Ceph

Actions

Copy link

Bug #15508

closed

client: simultaneous readdirs are very racy

Added by Greg Farnum about 8 years ago. Updated almost 8 years ago.

Status:

Resolved

Priority:

High

Assignee:

Zheng Yan

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Client

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Imagine we have a ceph-fuse user doing readdirs a and b on a very large directory (which requires multiple MDS round-trips, and multiple local readdir syscalls for every MDS round trip).

a finishes first. Because the directory wasn't changed, it marks the directory COMPLETE|ORDERED
b has last received an MDS readdir for offsets x to y and is serving those results

readdir c starts from offset 0.
b finishes up to y, and sends off an MDS request to readdir starting at y+1
readdir c reaches location y+1 from cache
b's response comes in. It pushes the range y+1 to z to the back of the directory's dentry xlist!
readdir c continues up to z before readdir b manages to get z+1 read back from the MDS.
readdir c ends prematurely because xlist::iterator::end() returns true.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Greg Farnum about 8 years ago

Priority changed from Normal to High

Some obvious solutions are disqualified, both because we can't really track what directory listing's are in progress (via dirp's), and in particular because the client might just drop a readdir set or crash before finishing. So the solution needs to depend only on internal state tracking.

I'm working on it. So far the winning approach is

keep track of the shared_gen when starting an MDS listing from offset 0 (well, 2, I guess)
when we get a response, if the shared_gen hasn't changed, set an "ordered_thru" to the latest offset
when satisfying a readdir, reference that ordered_thru instead of the simple COMPLETE and ORDERED flags :/

There are plenty of missing parts to that, but I think the basic scheme should be sound. (It sounds just a little bit like PG backfilling...)

Actions

Copy link

Updated by Greg Farnum about 8 years ago

Related to Bug #13271: Missing dentry in cache when doing readdirs under cache pressure (?????s in ls-l) added

Actions

Copy link

Updated by Zheng Yan about 8 years ago

Another option is assign dentry a cache index and use array to track the dentry list. If the shared_gen hasn't changed, a given dentry is always at the same position of the array. This is how kernel client currently does.

Actions

Copy link

Updated by Greg Farnum about 8 years ago

Hmm, I think the end result would be pretty much the same, although just having an array might be simpler. A pointer per dentry in an open frag isn't that expensive even if we are evicting stuff...*ponders*

Actions

Copy link