Project

General

Profile

Mds - reduce memory consumption » History » Version 1

Jessica Mack, 06/07/2015 01:11 AM

1 1 Jessica Mack
h1. Mds - reduce memory consumption
2
3
h3. Summary
4
5
The MDS internal cache structs are very large, reducing the amount of metadata that ceph-mds can cache at a time.  Most of the fields are only used when metadata is dirty.
6
7
h3. Owners
8
9
* Name (Affiliation)
10
* Name (Affiliation)
11
* Name
12
13
h3. Interested Parties
14
15
* Sage Weil (Inktank)
16
* Danny Al-Gaaf
17
18
h3. Current Status
19
20
The CInode struct is > 1KB, and CDir and CDentry are also quite large.  Most fields are only used for dirty metadata.  On startup, ceph-mds dumps the struct sizes to its log.
21
The cache size is currently controlled via a simple count on the number of inodes (mds cache size).
22
23
h3. Detailed Description
24
25
Since most of the fields are only used when metadata is dirtied, they can be moved into an auxiliary structure that is allocated on the heap when necessary.  For example, CInode could have a member dirty_state_t *ds; that is allocated when it is dirtied and freed when the changes fully commit and flush.
26
There are two phases that dirty/modified metadata goes through.  One is the "pre-dirty", "projected" changes that exist only in memory that track state while we are waiting for the modification to reach the journal.  The second phase is the (much longer) period where the metadata is durable and committed but still pinned in memory because the change hasn't been written to the per-directory metadata object.  
27
28
h3. Work items
29
30
h3. Coding tasks
31
32
# CInode: classify which fields are necessary for projected changes and which are needed for dirty (journaled) metadata.  Decide whether we want two auxiliary structures for each phase or just only one for projected changes.
33
# CInode: create CInode substructure(s) and any helpers related to access or allocation/deallocation
34
# CInode: wire allocation/deallocation into projected/predirty lifecycle (allocation into project_...(), deallocation in pop_dirty_projected()
35
# CInode: wire allocation/deallocation into dirty/journaled lifecycle (allocation in the predirty or dirty methods, deallocation when metadata is finally written to the directory fragment object)
36
# CInode: move fields into substructure.  this can be iterative, probably one patch for each field or related group of fields.
37
# CDentry: repeat
38
# CDir: repeat
39
# create boost memory pool for substructures for better allocator efficiency
40
# consider whether any inode_t or dirfrag_t fields should be dynamically allocated