Project

General

Profile

Bug #15508

client: simultaneous readdirs are very racy

Added by Greg Farnum almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
04/14/2016
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:

Description

Imagine we have a ceph-fuse user doing readdirs a and b on a very large directory (which requires multiple MDS round-trips, and multiple local readdir syscalls for every MDS round trip).

a finishes first. Because the directory wasn't changed, it marks the directory COMPLETE|ORDERED
b has last received an MDS readdir for offsets x to y and is serving those results

readdir c starts from offset 0.
b finishes up to y, and sends off an MDS request to readdir starting at y+1
readdir c reaches location y+1 from cache
b's response comes in. It pushes the range y+1 to z to the back of the directory's dentry xlist!
readdir c continues up to z before readdir b manages to get z+1 read back from the MDS.
readdir c ends prematurely because xlist::iterator::end() returns true.


Related issues

Related to fs - Bug #13271: Missing dentry in cache when doing readdirs under cache pressure (?????s in ls-l) Resolved 09/29/2015
Copied to fs - Backport #16251: jewel: client: simultaneous readdirs are very racy Resolved

History

#1 Updated by Greg Farnum almost 3 years ago

  • Priority changed from Normal to High

Some obvious solutions are disqualified, both because we can't really track what directory listing's are in progress (via dirp's), and in particular because the client might just drop a readdir set or crash before finishing. So the solution needs to depend only on internal state tracking.

I'm working on it. So far the winning approach is
  • keep track of the shared_gen when starting an MDS listing from offset 0 (well, 2, I guess)
  • when we get a response, if the shared_gen hasn't changed, set an "ordered_thru" to the latest offset
  • when satisfying a readdir, reference that ordered_thru instead of the simple COMPLETE and ORDERED flags :/

There are plenty of missing parts to that, but I think the basic scheme should be sound. (It sounds just a little bit like PG backfilling...)

#2 Updated by Greg Farnum almost 3 years ago

  • Related to Bug #13271: Missing dentry in cache when doing readdirs under cache pressure (?????s in ls-l) added

#3 Updated by Zheng Yan almost 3 years ago

Another option is assign dentry a cache index and use array to track the dentry list. If the shared_gen hasn't changed, a given dentry is always at the same position of the array. This is how kernel client currently does.

#4 Updated by Greg Farnum almost 3 years ago

Hmm, I think the end result would be pretty much the same, although just having an array might be simpler. A pointer per dentry in an open frag isn't that expensive even if we are evicting stuff...*ponders*

#5 Updated by Zheng Yan almost 3 years ago

  • Assignee changed from Greg Farnum to Zheng Yan

I found that seekdir can also trigger this issue. I'm working on fixing it.

#6 Updated by Zheng Yan almost 3 years ago

  • Status changed from New to Need Review

#7 Updated by Greg Farnum almost 3 years ago

  • Status changed from Need Review to Pending Backport

#8 Updated by Nathan Cutler almost 3 years ago

  • Backport set to jewel

#9 Updated by Nathan Cutler almost 3 years ago

  • Copied to Backport #16251: jewel: client: simultaneous readdirs are very racy added

#10 Updated by Greg Farnum almost 3 years ago

  • Status changed from Pending Backport to Resolved

#11 Updated by Greg Farnum over 2 years ago

  • Component(FS) Client added

Also available in: Atom PDF