Project

General

Profile

Fixed memory layout for MessageOp passing

Summary

Now ObjectStore's transaction encode process consume too much time for PG thread which cause 20%-30% latency of the total osd op latency. ObjectStore transaction is a central object which collects ObjectStore's ops corresponding to a client op.
In order to make a tradeoff between flexible ObjectStore's API design and performance purpose, here want to introduce the fixed-position op's structure memory layout which can avoid encode/decode works.
Futhermore, if we have a good impl for ObjectStore transaction which fixed member layout. We can expand it to Message impl which has the same problem. If these applied, six performance pains which are client encode/decode ops, message encode/decode, pg op encode/decode, subop encode/decode can be killed.

Owners

  • Haomai Wang (UnitedStack)
  • Dong Yuan(UnitedStack)
  • Name

Interested Parties

  • Name (Affiliation)
  • Name (Affiliation)
  • Name

Current Status

sage has a blueprint for ObjectStore transaction design(https://wiki.ceph.com/Planning/Blueprints/Hammer/osd%3A_update_Transaction_encoding) and corresponding branch(https://github.com/ceph/ceph/tree/wip-transaction).

Detailed Description

In general, there exists two ways to acchive the goal:
1. Implement a new uniform memory layout model which can place differents ObjectStore ops. For example, a simple model see below:

| op_num | op1 (type, offset and len) | op2 (type, offset and len) | op3 (type, offset and len) | ... | op1 arguments(opaque) | op 2 arguemtns(opque) | op3 arguemnts(opaque) | bufferlist ......|

The core of this method I think is weeding out STL or other complexity structures for ObjectStore's API, so op's arguments's should be simple element so memory layout can be fixed. So if we mainly use simple element, Message can refer to this and refactor like this.

2. Totally discard ObjectStore's transactin and let successors do it. For example, ObjectStore's write method will directly call FileStore::_write, and FileStore is responsiable for buffer op and decide to how do it. So, we can directly pass existing STL vector and map to FileStore and FileStore can reference it without encode/decode.

This method also need add additional information to subop message which require replicate osd calculate op again.

In short, method 1 need we redesign ObjectStore's API and use simple elements for op's arugments. Method 2's problem is that we need to consider replicated OSD need to calculate again and maybe break some rules.

Work items

Coding tasks

  1. Task 1
  2. Task 2
  3. Task 3

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3