Project

General

Profile

OSD - add flexible cache control of object data

Summary

By default OSD which use fs as backend will try to cache all objects in memory after each write. Releasing those page cache depends on kernel settings (/proc/sys/vm*).
However in a typical large cluster, this does not make much sense:
1) Most of the objects will not be serviced immediately after written.
2) Keep large objects in memory consumes too much memory.
3) If there is a cache tier, the data will not be accessed in a short time mostly once it’s promoted from the base.
4) Although, it’s capable to drop data page cache by changing value of /proc/fs/vfs_drop_cache. But this works at system level and don't have good flexibility obviously.

We propose a new feature that allows OSD to drop buffer cache if data will not be access in the near future.

The advantage of this feature:
1: Save many memory to use as other like inode/denty.
2: Using small memory host as storage node.

Owners

  • Jianpeng Ma(intel)
  • Yuan Zhou(intel)
  • Jian Zhang(intel)
  • Jiangang Duan(intel)

Interested Parties

  • Name (Affiliation)
  • Name (Affiliation)
  • Name

Current Status

Detailed Description

1: The granularity of dropping data page cache are
A: all ceph cluster
B: pool like erasure pool
C: object which set a flag to indicate it will drop data page cache.

2: For write-operation
A: for osd-op
In FileStore::_do_transaction, for write object, record the cid, oid, offset, len.
After FileStore::sync_entry, we can start to drop the object.
B: for subop
In handle_message, we record the write object info(cid, oid, offset, len)
After FileStore::sync_entry, we can start to drop the object.
3: For read-operation
In FileStore::read, add a bool or flag to whether drop page cache after read operation.
4: For recovery/scrub/repair and so on, we also use this method.

Work items

Coding tasks

  1. Task 1
  2. Task 2
  3. Task 3

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3