Project

General

Profile

Actions

Bug #57966

open

Ceph cluster osds failed when ms_cluster_type=async+rdma is used

Added by guoguo jie over 1 year ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
common
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):

2ccdd1f47aacf4a8f55c21837c3d39a6f36fa33824ba49578bb6cab9c3598254
3306ab83ccd165a6cccc4c91250879bb7262941666bb0d9472b38382a5c62a3d
33fdcea66d9494023cab7cacb0df26c3001b94d44d5c95bab5239fb11d66b2c3
4f70e5a6707159820b2b8da3fcca4d263f45a3f9eb1bdef54bb1545558ee1cb3
766be26bfe831e0682d2048a8dc960defeda12d09e3023e77c277e519784b73b
a594420536d5797db898deac34687a997b2ef5e8914abdb809f37169e5214cb7
aeeb799c9ee05a227eb75f0cb7663cda11a7b7f979e8b6bd736fd7c967d3fd0c
e9a587500cd0d3a0ca3003144680586ba4656cdec011bfd2a91e3f3334bfa213
f318bb7dd5ed05f869f475d1e04f0f440e8297d0dbc2b4f00d44d37b5b941c71
f514f2bcfff5e613764a936508a85d6b0f61441266b5bc22f710f2a95eabe04d
f665a1ed57f0db8ce90b2de73edc1fcabb16d36f4af7dbb424d1f9ebacfcd2d0


Description

Currently, using iboip can run normally:
The steps are as follows:
Check cluster health:
Ceph health detail.
Ceph config set global mon_clock_drift_allowed 3.
Ceph config set global osd_pool_default_size 2.
Add an internal IB cluster network.
Ceph config set osd cluster_network 10.10.20.0/24.
Ceph config get osd cluster_network.
Be sure to check if the internal network is used after the reboot system:
Ceph osd metadata 0 | grep addr.
Ceph osd metadata 1 | grep addr.
Ceph osd dump | grep 10.10.
After the above operation is new, the ceph cluster is running normally.

Try to enable the rdma parameter:

Then operate in the following order: just set the OSD domain, and each node commands show_gids to get ms_async_rdma_device_name and ms_async_rdma_local_gid.

Ceph config set osd ms_async_rdma_device_name mlx5_0.
Ceph config set osd.0 ms_async_rdma_local_gid fe80:0000:0000:0000:480f:cfff:fff3:9974.
Ceph config set osd.1 ms_async_rdma_local_gid fe80:0000:0000:0000:7010:6fff:ffa2:1430.
Ceph config set osd ms_cluster_type async+rdma.
Ceph osd metadata 0 | grep addr.
Ceph osd metadata 1 | grep addr.
Ceph osd dump | grep 10.10.
Check IB Nic traffic sar-n DEV 1 | grep ib.
After restarting the osd and mon services.
Ceph-s failed


Files

rdma.jpg (230 KB) rdma.jpg guoguo jie, 11/03/2022 02:32 AM
Actions

Also available in: Atom PDF