Project

General

Profile

Actions

Cleanup #4176

closed

poor use of DIR_? subdirs in osds

Added by Alexandre Oliva about 11 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Since the lowest bits of the hash are used to decide in which PG to place an object, using the same lowest bits within the PG to decide in which DIR_? subdir the object should go leads to all objects being placed in the same subdir. E.g., when there are 256 PGs in a pool, PG 0.ac will have all files placed inside DIR_C/DIR_A.

I suggest that, when the next major on-disk layout kicks in (or perhaps as a default option in newly-created filesystems), the other end of the hash string be used for subdir naming/placement, or that the bits of the hash already used to select the PG be shifted out so as to not use them to select subDIR_s.

Actions #1

Updated by Sage Weil about 11 years ago

  • Project changed from CephFS to Ceph
Actions #2

Updated by Greg Farnum about 11 years ago

Have you seen this when examining OSD stores? The use of the same hash as they're split with is quite deliberate — it means we can split PGs just by moving a few folders around instead of moving all the object inodes!

Actions #3

Updated by Alexandre Oliva about 11 years ago

I first noticed it stracing ceph-osds. Each access in my cluster is preceded by 2-3 useless stats (besides ones that are useful). Nearly all of my 1500 PGs have one extra layer of DIR, all of them with single entry. Most have two extra layers, both of them with a single entry too. None of my pools has fewer than 16 PGs, so any one that grows enough to get subdirs gets a useless dir layer; the two pools with more than 256 PGs all get two such layers.

Even if moving dirs is the goal (it makes sense now!), the redundant hash nibbbles could be safely dropped from DIR_ chains, no?

Actions #4

Updated by Greg Farnum about 11 years ago

  • Assignee set to Samuel Just

Sam groaned about more complicated code when I mentioned this to him, so I'm sending it his way to adjudicate. ;)

Actions #5

Updated by Samuel Just about 11 years ago

Yeah, that bit could be made more efficient. Actually though, with leveldb, we could drop pg collections altogether...

Actions #6

Updated by Greg Farnum about 11 years ago

You mean by dropping any use of the filesystem at all?

Actions #7

Updated by Samuel Just over 7 years ago

  • Status changed from New to Closed

bluestore.

Actions

Also available in: Atom PDF