Project

General

Profile

Actions

Bug #63749

open

rdma: when set ms_async_rdma_receive_queue_len is low, ceph health detail could hang.

Added by ren weiguo 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In my roce envoriment:
ubuntu22.04 Linux node194 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
root@node194:/var/log/ceph/renweiguo# ibv_devices
device node GUID
------ ----------------
mlx5_bond_0 08c0eb03006f1f82
mlx5_bond_1 08c0eb03006e5c2a
reproduces
1.use vstart to run single monitor:
MON=1 OSD=0 MDS=0 MGR=0 RGW=0 NFS=0 ../src/vstart.sh --debug --new -x --localhost --bluestore
2 edit ceph.conf
add [global]
ms type = async+rdma
ms_async_rdma_device_name = mlx5_bond_0
ms_async_rdma_gid_idx = 3
ms_async_rdma_polling_us = 0
ms_async_rdma_receive_queue_len = 2
3.restart ceph-mon
4.cat ceph-helath.sh
./bin/ceph health detail&
./bin/ceph health detail&
5. watch 'sh ceph-health.sh'
6.wait about 10min ceph -s hang.

Actions #1

Updated by Casey Bodley 5 months ago

  • Project changed from rgw to Messengers
Actions

Also available in: Atom PDF