Bug #15508
closedclient: simultaneous readdirs are very racy
0%
Description
Imagine we have a ceph-fuse user doing readdirs a and b on a very large directory (which requires multiple MDS round-trips, and multiple local readdir syscalls for every MDS round trip).
a finishes first. Because the directory wasn't changed, it marks the directory COMPLETE|ORDERED
b has last received an MDS readdir for offsets x to y and is serving those results
readdir c starts from offset 0.
b finishes up to y, and sends off an MDS request to readdir starting at y+1
readdir c reaches location y+1 from cache
b's response comes in. It pushes the range y+1 to z to the back of the directory's dentry xlist!
readdir c continues up to z before readdir b manages to get z+1 read back from the MDS.
readdir c ends prematurely because xlist::iterator::end() returns true.
Updated by Greg Farnum about 8 years ago
- Priority changed from Normal to High
Some obvious solutions are disqualified, both because we can't really track what directory listing's are in progress (via dirp's), and in particular because the client might just drop a readdir set or crash before finishing. So the solution needs to depend only on internal state tracking.
I'm working on it. So far the winning approach is- keep track of the shared_gen when starting an MDS listing from offset 0 (well, 2, I guess)
- when we get a response, if the shared_gen hasn't changed, set an "ordered_thru" to the latest offset
- when satisfying a readdir, reference that ordered_thru instead of the simple COMPLETE and ORDERED flags :/
There are plenty of missing parts to that, but I think the basic scheme should be sound. (It sounds just a little bit like PG backfilling...)
Updated by Greg Farnum about 8 years ago
- Related to Bug #13271: Missing dentry in cache when doing readdirs under cache pressure (?????s in ls-l) added
Updated by Zheng Yan about 8 years ago
Another option is assign dentry a cache index and use array to track the dentry list. If the shared_gen hasn't changed, a given dentry is always at the same position of the array. This is how kernel client currently does.
Updated by Greg Farnum about 8 years ago
Hmm, I think the end result would be pretty much the same, although just having an array might be simpler. A pointer per dentry in an open frag isn't that expensive even if we are evicting stuff...*ponders*
Updated by Zheng Yan almost 8 years ago
- Assignee changed from Greg Farnum to Zheng Yan
I found that seekdir can also trigger this issue. I'm working on fixing it.
Updated by Zheng Yan almost 8 years ago
- Status changed from New to Fix Under Review
last commit of https://github.com/ceph/ceph/pull/8739
Updated by Greg Farnum almost 8 years ago
- Status changed from Fix Under Review to Pending Backport
Backport PR: https://github.com/ceph/ceph/pull/9655
Updated by Nathan Cutler almost 8 years ago
- Copied to Backport #16251: jewel: client: simultaneous readdirs are very racy added
Updated by Greg Farnum almost 8 years ago
- Status changed from Pending Backport to Resolved