Documentation #65537
openRDMA support
0%
Description
Hi guys,
I needed to setup Ceph over RDMA, but I faced many issues! Because there is not enough info in the document regarding RDMA. I have searched a lot on the web and mailing list to figure out what should I do.
There is just ms_type = async+rdma in the document, but there are options not mentioned. I get them using osd config show:
ceph config show-with-defaults osd.0 | grep rdma
ms_async_rdma_buffer_size 131072
ms_async_rdma_cm false
ms_async_rdma_device_name
ms_async_rdma_dscp 96
ms_async_rdma_enable_hugepage false
ms_async_rdma_gid_idx 0
ms_async_rdma_local_gid
ms_async_rdma_polling_us 1000
ms_async_rdma_port_num 1
ms_async_rdma_receive_buffers 32768
ms_async_rdma_receive_queue_len 4096
ms_async_rdma_roce_ver 1
ms_async_rdma_send_buffers 1024
ms_async_rdma_sl 3
ms_async_rdma_support_srq true
ms_async_rdma_type ib
When I checked Ceph github I found these options with_legacy: true.
https://github.com/ceph/ceph/blob/main/src/common/options/global.yaml.in
type: str
level: advanced
with_legacy: true
- name: ms_async_rdma_enable_hugepage
type: bool
level: advanced
default: false
with_legacy: true
- name: ms_async_rdma_buffer_size
type: size
level: advanced
default: 128_K
with_legacy: true
- name: ms_async_rdma_send_buffers
type: uint
level: advanced
default: 1_K
with_legacy: true
- size of the receive buffer pool, 0 is unlimited
- name: ms_async_rdma_receive_buffers
type: uint
level: advanced
default: 32_K
with_legacy: true - max number of wr in srq
- name: ms_async_rdma_receive_queue_len
type: uint
level: advanced
default: 4_K
with_legacy: true - support srq
- name: ms_async_rdma_support_srq
type: bool
level: advanced
default: true
with_legacy: true
- name: ms_async_rdma_port_num
type: uint
level: advanced
default: 1
with_legacy: true
- name: ms_async_rdma_polling_us
type: uint
level: advanced
default: 1000
with_legacy: true
- name: ms_async_rdma_gid_idx
type: int
level: advanced
desc: use gid_idx to select GID for choosing RoCEv1 or RoCEv2
default: 0
with_legacy: true - GID format: "fe80:0000:0000:0000:7efe:90ff:fe72:6efe", no zero folding
- name: ms_async_rdma_local_gid
type: str
level: advanced
with_legacy: true - 0=RoCEv1, 1=RoCEv2, 2=RoCEv1.5
- name: ms_async_rdma_roce_ver
type: int
level: advanced
default: 1
with_legacy: true - in RoCE, this means PCP
- name: ms_async_rdma_sl
type: int
level: advanced
default: 3
with_legacy: true - in RoCE, this means DSCP
- name: ms_async_rdma_dscp
type: int
level: advanced
default: 96
with_legacy: true - when there are enough accept failures, indicating there are unrecoverable failures,
- just do ceph_abort() . Here we make it configurable.
- name: ms_max_accept_failures
type: int
level: advanced
desc: The maximum number of consecutive failed accept() calls before considering
the daemon is misconfigured and abort it.
default: 4
with_legacy: true - rdma connection management
- name: ms_async_rdma_cm
type: bool
level: advanced
default: false
with_legacy: true
- name: ms_async_rdma_type
type: str
level: advanced
default: ib
with_legacy: true
It causes confusion and The RDMA setup needs more detail in the document.
Regards
No data to display