Cleanup #4176
closed
poor use of DIR_? subdirs in osds
Added by Alexandre Oliva about 11 years ago.
Updated over 7 years ago.
Description
Since the lowest bits of the hash are used to decide in which PG to place an object, using the same lowest bits within the PG to decide in which DIR_? subdir the object should go leads to all objects being placed in the same subdir. E.g., when there are 256 PGs in a pool, PG 0.ac will have all files placed inside DIR_C/DIR_A.
I suggest that, when the next major on-disk layout kicks in (or perhaps as a default option in newly-created filesystems), the other end of the hash string be used for subdir naming/placement, or that the bits of the hash already used to select the PG be shifted out so as to not use them to select subDIR_s.
- Project changed from CephFS to Ceph
Have you seen this when examining OSD stores? The use of the same hash as they're split with is quite deliberate — it means we can split PGs just by moving a few folders around instead of moving all the object inodes!
I first noticed it stracing ceph-osds. Each access in my cluster is preceded by 2-3 useless stats (besides ones that are useful). Nearly all of my 1500 PGs have one extra layer of DIR, all of them with single entry. Most have two extra layers, both of them with a single entry too. None of my pools has fewer than 16 PGs, so any one that grows enough to get subdirs gets a useless dir layer; the two pools with more than 256 PGs all get two such layers.
Even if moving dirs is the goal (it makes sense now!), the redundant hash nibbbles could be safely dropped from DIR_ chains, no?
- Assignee set to Samuel Just
Sam groaned about more complicated code when I mentioned this to him, so I'm sending it his way to adjudicate. ;)
Yeah, that bit could be made more efficient. Actually though, with leveldb, we could drop pg collections altogether...
You mean by dropping any use of the filesystem at all?
- Status changed from New to Closed
Also available in: Atom
PDF