Project

General

Profile

Actions

Bug #15508

closed

client: simultaneous readdirs are very racy

Added by Greg Farnum about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Imagine we have a ceph-fuse user doing readdirs a and b on a very large directory (which requires multiple MDS round-trips, and multiple local readdir syscalls for every MDS round trip).

a finishes first. Because the directory wasn't changed, it marks the directory COMPLETE|ORDERED
b has last received an MDS readdir for offsets x to y and is serving those results

readdir c starts from offset 0.
b finishes up to y, and sends off an MDS request to readdir starting at y+1
readdir c reaches location y+1 from cache
b's response comes in. It pushes the range y+1 to z to the back of the directory's dentry xlist!
readdir c continues up to z before readdir b manages to get z+1 read back from the MDS.
readdir c ends prematurely because xlist::iterator::end() returns true.


Related issues 2 (0 open2 closed)

Related to CephFS - Bug #13271: Missing dentry in cache when doing readdirs under cache pressure (?????s in ls-l)Resolved09/29/2015

Actions
Copied to CephFS - Backport #16251: jewel: client: simultaneous readdirs are very racyResolvedGreg FarnumActions
Actions #1

Updated by Greg Farnum about 8 years ago

  • Priority changed from Normal to High

Some obvious solutions are disqualified, both because we can't really track what directory listing's are in progress (via dirp's), and in particular because the client might just drop a readdir set or crash before finishing. So the solution needs to depend only on internal state tracking.

I'm working on it. So far the winning approach is
  • keep track of the shared_gen when starting an MDS listing from offset 0 (well, 2, I guess)
  • when we get a response, if the shared_gen hasn't changed, set an "ordered_thru" to the latest offset
  • when satisfying a readdir, reference that ordered_thru instead of the simple COMPLETE and ORDERED flags :/

There are plenty of missing parts to that, but I think the basic scheme should be sound. (It sounds just a little bit like PG backfilling...)

Actions #2

Updated by Greg Farnum about 8 years ago

  • Related to Bug #13271: Missing dentry in cache when doing readdirs under cache pressure (?????s in ls-l) added
Actions #3

Updated by Zheng Yan about 8 years ago

Another option is assign dentry a cache index and use array to track the dentry list. If the shared_gen hasn't changed, a given dentry is always at the same position of the array. This is how kernel client currently does.

Actions #4

Updated by Greg Farnum about 8 years ago

Hmm, I think the end result would be pretty much the same, although just having an array might be simpler. A pointer per dentry in an open frag isn't that expensive even if we are evicting stuff...*ponders*

Actions #5

Updated by Zheng Yan almost 8 years ago

  • Assignee changed from Greg Farnum to Zheng Yan

I found that seekdir can also trigger this issue. I'm working on fixing it.

Actions #6

Updated by Zheng Yan almost 8 years ago

  • Status changed from New to Fix Under Review
Actions #7

Updated by Greg Farnum almost 8 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Nathan Cutler almost 8 years ago

  • Backport set to jewel
Actions #9

Updated by Nathan Cutler almost 8 years ago

  • Copied to Backport #16251: jewel: client: simultaneous readdirs are very racy added
Actions #10

Updated by Greg Farnum almost 8 years ago

  • Status changed from Pending Backport to Resolved
Actions #11

Updated by Greg Farnum almost 8 years ago

  • Component(FS) Client added
Actions

Also available in: Atom PDF