Project

General

Profile

Actions

Feature #12671

open

Enforce cache limit during dirfrag load during open_ino (during rejoin)

Added by John Spray over 8 years ago. Updated almost 8 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

When clients replay requests referring to inodes not found in cache, the inode numbers are stashed for loading later (in MDCache::cap_imports).

Later, in MDCache::process_imported_caps (i.e. during rejoin), MDCache calls open_ino for these.

open_ino (and subsequently open_ino_traverse_dir) load the backtrace and traverse the parents, but for each dirfrag traversed, it is loaded if not complete.

The result is that if you have many large dirfrags, and some imported caps during rejoin, then it is possible for the MDS to aggressively exceed the usual cache size limit (trim() is never called during rejoin).

We need to either do some trimming at some point during this phase, or we need to make the open_ino procedure not force directories to be completely opened (by improving the CDir::fetch path to allow selective loading of dentries).

Actions #1

Updated by John Spray over 8 years ago

  • Category set to 47
  • Priority changed from Normal to High

The source of this observation was https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22235.html

In this instance the user has 64k files in each directory, and directory fragmentation is not enabled (as is our current default).

However, we could readily also see this scenario even if fragmentation was enabled. For example if there are 100 clients working in 100 dirs, each just below the default fragmentation threshold (10k dentries), we would try and ram a million inodes into memory during rejoin.

Actions #2

Updated by Greg Farnum almost 8 years ago

The naive solution to this seems pretty bad as well. If we only load the needed dentries, in a serial fashion, we'll probably do a lot more disk accesses in order to load stuff than is necessary. That disk access is the limiting factor in replay speed, too.
So we will want to be careful about batching disk IOs together.

Actions #3

Updated by Greg Farnum almost 8 years ago

  • Category changed from 47 to Performance/Resource Usage
  • Component(FS) MDS added
Actions #4

Updated by Greg Farnum almost 8 years ago

If we do #13688, we probably won't need this one or can put it off.

Actions

Also available in: Atom PDF