Project

General

Profile

Bug #16164

mds: enforce a dirfrag limit on entries

Added by Sage Weil 7 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Category:
Performance/Resource Usage
Target version:
-
Start date:
06/06/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Component(FS):
Needs Doc:
No

Description

- add a new config option to cap the number of entries in a difrag
- set the limit an order of magnitude higher than the fragment tunable that triggers creating new fragments.
- ENOSPC if we hit the limit on mknod, openc, mkdir, symlink, rename, link.

This will prevent users with fragmentation turned off from creating huge omap directories.


Related issues

Copied to Backport #16560: jewel: mds: enforce a dirfrag limit on entries Resolved

History

#1 Updated by Sage Weil 7 months ago

  • Project changed from Ceph to fs

#2 Updated by Patrick Donnelly 7 months ago

  • Assignee set to Patrick Donnelly

I'm taking a look at this one.

#3 Updated by Patrick Donnelly 7 months ago

  • Status changed from New to In Progress

#4 Updated by Greg Farnum 7 months ago

Hmm, I was talking to m0zes (whose situation kicked off this bug) and it turns out the objects actually causing the issue are the stray directories rather than the large user directories.

He says he'd actually be okay with disallowing deletes as well, if it prevents the issue from recurring in his cluster. I wouldn't want to do that by default, but it might be pretty simple to set up with a default-off config option. How do others feel about that? I imagine it can hook in in pretty much the same ways as the rename logic will.

#5 Updated by Xiaoxi Chen 7 months ago

Greg Farnum wrote:

Hmm, I was talking to m0zes (whose situation kicked off this bug) and it turns out the objects actually causing the issue are the stray directories rather than the large user directories.

He says he'd actually be okay with disallowing deletes as well, if it prevents the issue from recurring in his cluster. I wouldn't want to do that by default, but it might be pretty simple to set up with a default-off config option. How do others feel about that? I imagine it can hook in in pretty much the same ways as the rename logic will.

Could I have more background on m0zes 's issue? This ticketed is from our previous bug report http://tracker.ceph.com/issues/16010#change-71438

#6 Updated by Greg Farnum 7 months ago

He's got more info in http://tracker.ceph.com/issues/16177.

Basically, a CephFS user created directories large enough to start causing problems with reading the omap entries fast enough for OSD recovery to work (just reading the entries out of leveldb took too long); and then deleted the entries so quickly that the stray directories got really large.

#7 Updated by Patrick Donnelly 7 months ago

  • Status changed from In Progress to Need Test

#8 Updated by John Spray 7 months ago

  • Status changed from Need Test to Pending Backport
  • Backport set to jewel

#9 Updated by Nathan Cutler 7 months ago

  • Copied to Backport #16560: jewel: mds: enforce a dirfrag limit on entries added

#10 Updated by Greg Farnum 6 months ago

  • Category set to Performance/Resource Usage

#11 Updated by Loic Dachary 5 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF