Project

General

Profile

Actions

Feature #11950

closed

Strays enqueued for purge cause MDCache to exceed size limit

Added by John Spray almost 9 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

If your purge operations are going slowly (either because of throttle or because of slow data pool), and you do lots of deletes, then the stray directories can grow very big.

As each stray dentry is created it gets eval_stray'd and potentially put onto the queue of things that are ready to be purged. But there is no guarantee that the queue progresses quickly, and everything in the queue is pinned in the cache. Even if it wasn't, the queued order has no locality with respect to dirfrags, so we would thrash dirfrags in and out of cache while draining the queue.

The solution is to have a new data structure outside of MDCache, where we put dentries that are beyond the "point of no return" in the stray->purge process. This would essentially just be a work queue where each work item is an inode, and the work is to purge it.

Test code for playing with this case:
https://github.com/ceph/ceph-qa-suite/tree/wip-cache-full-strays

Actions #1

Updated by John Spray almost 9 years ago

  • Project changed from Ceph to CephFS
  • Category set to 47
Actions #2

Updated by John Spray almost 9 years ago

Related to FSCK: if adding a persistent structure containing references to inodes to be purged, ensure backward scrub tools are able to interrogate this structure so as to avoid incorrectly thinking the to-be-purged inodes are orphans and incorrectly linking them into lost+found.

Actions #3

Updated by Greg Farnum almost 9 years ago

  • Tracker changed from Bug to Feature
Actions #4

Updated by Greg Farnum about 8 years ago

  • Priority changed from Normal to High

This came up in #15379. I think we're going to start seeing it more often with the Manila use case...

Actions #5

Updated by John Spray about 8 years ago

Yep, this should be a fairly high priority to do something about.

The "real" solution (a scalable way of persisting the queue to purge) is not trivial, so maybe we need a stop-gap. It's an ugly hack, but we could do some crude back pressure by blocking unlinks (RetryRequest) when the purge queue is above a threshold and the cache is close to full, and putting a queue of contexts to kick into StrayManager. Actually that does sound quite sensible when I say it out loud, thoughts?

Actions #6

Updated by Greg Farnum about 8 years ago

Possibly. I'm concerned about exposing slow deletes to users via Manila, but it may be the best we can do in the short term.

Once upon a time I was hoping to use the journaling stuff Jason wrote for RBD, but that got pretty large and I'm not sure the library is suitable for us any more. I don't remember why we end up pinning the way we do; maybe we can do some kind of hack with stray directories and the journal where we only keep a segment's (or some limit's) worth of stray inodes. That would mean slow trimming would just show up as an MDS whose log is longer than it should be (..and slow down restarts, I guess), but I'm not sure if it's feasible in the code logic off-hand.

Actions #7

Updated by Zheng Yan about 8 years ago

  • Status changed from New to Fix Under Review
Actions #8

Updated by John Spray almost 8 years ago

  • Status changed from Fix Under Review to 12

I'm reverting state to verified because although we've merged the patch for this, it still needs more attention to have a full solution.

Actions #9

Updated by Greg Farnum almost 8 years ago

  • Category changed from 47 to Performance/Resource Usage
  • Component(FS) MDS added
Actions #10

Updated by John Spray over 7 years ago

  • Assignee set to John Spray
  • Target version set to v12.0.0

Targeting for Luminous and assigning to me: we will use a single Journaler() instance per MDS to track a persistent purge queue.

Actions #11

Updated by John Spray over 7 years ago

  • Status changed from 12 to In Progress
Actions #12

Updated by John Spray over 7 years ago

  • Status changed from In Progress to Fix Under Review
Actions #13

Updated by John Spray about 7 years ago

  • Status changed from Fix Under Review to Resolved

PurgeQueue has merged to master, will be in Luminous.

Actions

Also available in: Atom PDF