Feature #12671: Enforce cache limit during dirfrag load during open_ino (during rejoin) - CephFS - Ceph

Actions

Copy link

Feature #12671

open

Enforce cache limit during dirfrag load during open_ino (during rejoin)

Added by John Spray over 8 years ago. Updated almost 8 years ago.

Status:

New

Priority:

High

Assignee:

Category:

Performance/Resource Usage

Target version:

% Done:

Source:

other

Tags:

Backport:

Reviewed:

Affected Versions:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Description

When clients replay requests referring to inodes not found in cache, the inode numbers are stashed for loading later (in MDCache::cap_imports).

Later, in MDCache::process_imported_caps (i.e. during rejoin), MDCache calls open_ino for these.

open_ino (and subsequently open_ino_traverse_dir) load the backtrace and traverse the parents, but for each dirfrag traversed, it is loaded if not complete.

The result is that if you have many large dirfrags, and some imported caps during rejoin, then it is possible for the MDS to aggressively exceed the usual cache size limit (trim() is never called during rejoin).

We need to either do some trimming at some point during this phase, or we need to make the open_ino procedure not force directories to be completely opened (by improving the CDir::fetch path to allow selective loading of dentries).

Actions

Copy link

Updated by John Spray over 8 years ago

Category set to 47
Priority changed from Normal to High

The source of this observation was https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22235.html

In this instance the user has 64k files in each directory, and directory fragmentation is not enabled (as is our current default).

However, we could readily also see this scenario even if fragmentation was enabled. For example if there are 100 clients working in 100 dirs, each just below the default fragmentation threshold (10k dentries), we would try and ram a million inodes into memory during rejoin.

Actions

Copy link

Updated by Greg Farnum almost 8 years ago

The naive solution to this seems pretty bad as well. If we only load the needed dentries, in a serial fashion, we'll probably do a lot more disk accesses in order to load stuff than is necessary. That disk access is the limiting factor in replay speed, too.
So we will want to be careful about batching disk IOs together.

Actions

Copy link