Project

General

Profile

Actions

Bug #18730

closed

mds: backtrace issues getxattr for every file with cap on rejoin

Added by John Spray about 7 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In Server::handle_client_reconnect, a inode numbers that had client caps but were not in cache are passed into MDCache::rejoin_recovered_caps. This puts them into MDCache::cap_imports. Later, during rejoin, in MDCache::process_imported_caps, cap_imports is iterated over and every item generates a call to MDCache::open_ino (i.e. a getxattr to the data pool to read the backtrace).

This is massively inefficient because (in almost any real workload) many of the files being resolved are in fact in the same directory as one another, and whichever one is fetched first will fetch the whole dirfrag, rendering the backtrace lookups for all the other files in that fragment redundant.

In practice, this is causing a user to experience 15-minute long rejoin phases on a system with ~5m files with capabilities:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html ("[ceph-users] MDS flapping: how to increase MDS timeouts?")

One simple solution would be to throttle the number of calls to open_ino from process_imported_caps to some configurable threshold (e.g. 1000 by default), so that by the time they came to try resolving the next batch of inodes, the dirfrags from the first batch would already be in cache and the open_inos would be no-ops (need to check that open_ino really handles the no-op case efficiently).

It might even make sense to do that throttling for open_ino in general, as if a workload was hitting lots of hard links at the same time, it's possible that they would hit a similar case where they generated far more getxattrs than needed.

Actions #1

Updated by Zheng Yan about 7 years ago

I think we should design a new mechanism to track in-use inodes (current method isn't scalable because it journals all in-use inode in each log segment)

Actions #2

Updated by Xiaoxi Chen about 7 years ago

Zheng Yan wrote:

I think we should design a new mechanism to track in-use inodes (current method isn't scalable because it journals all in-use inode in each log segment)

Sorry , Zheng, one question, why we need to fetch the backtrace from default datapool first, and then retry on the real pool that file reside?

Not sure if I understand correctly , it seems when creating an file, the backtrace will reside on both default_pool and target_pool, but later, if we mv the file to another path, the update only goes to target_pool?

And , in which case the backtrace will exists only in matadata pool ? I am trying to understand https://github.com/ceph/ceph/blob/master/src/mds/MDCache.cc#L8341-L8351

Actions #3

Updated by Patrick Donnelly about 6 years ago

  • Subject changed from MDS issues backtrace getxattr for every file with cap on rejoin to mds: backtrace issues getxattr for every file with cap on rejoin
  • Assignee set to Zheng Yan
  • Priority changed from Normal to High
  • Target version changed from v12.0.0 to v13.0.0
  • Source set to Development
  • Backport set to luminous
Actions #4

Updated by Zheng Yan about 6 years ago

  • Status changed from New to Closed

should be resolved by open file table https://github.com/ceph/ceph/pull/20132

Actions

Also available in: Atom PDF