Bug #22944

Infiniband send_msg send returned error 32: (32) Broken pipe

Added by Radosław Piliszek over 2 years ago. Updated over 2 years ago.

Target version:
% Done:


Community (user)
rdma, infiniband
2 - major
Affected Versions:
Pull request ID:
Crash signature:


Using CentOS 7.4, Mellanox OFED 4.2 on Connect-X 3 in Infiniband mode and Ceph 12.2.2 compiled with RDMA.

I've set:

ms type = async+rdma

I've fixed systemd units to allow RDMA device usage.

But I cannot get RDMA to work. Connections time out.

I sometimes get:

Infiniband send_msg send returned error 32: (32) Broken pipe

in monitor service journal.

RDMA by itself works just fine (verified with rping).

Ceph by itself also works just fine (verified without RDMA).

This happens even on one-node cluster when trying to access it from the very same node.

Please let me know how I can get you more info to debug it.

ceph-client.smp-016.log View - client log (46.4 KB) Radosław Piliszek, 02/13/2018 08:01 AM

ceph-mon.smp-016.log View - mon log (54 KB) Radosław Piliszek, 02/13/2018 08:01 AM

ibdump.smp-016.pcap - ibdump capture (3.46 KB) Radosław Piliszek, 02/13/2018 08:01 AM


#1 Updated by Greg Farnum over 2 years ago

The RDMA support in AsyncMessenger is experimental and I think the guys building it are planning to rip it apart. I would just use normal ethernet.

#2 Updated by Greg Farnum over 2 years ago

  • Assignee set to Haomai Wang

Haomai, can you follow up if this interests you, or else close it? :)

#3 Updated by Haomai Wang over 2 years ago

hmm, I don't think the provided log is the reason. maybe you can set debug_ms=20/20 to output more?

#4 Updated by Radosław Piliszek over 2 years ago

Hi Greg, Hi Haomai,

I increased debug to 20. Please find mon and client logs attached.

I also ran ibdump to discover that packets are malformed. There is no LID 4 in the fabric (used as DLID), SGID is invalid and DGID looks like repeated random sequence (also invalid). DQPN is invalid as well (it does not agree with logs). Also SL/TC look random to me but this they are valid and irrelevant when considering same node. I attach the capture. Recent Wireshark opens it just fine.

DGID seems to change from run to run. DLID, SGID, TC, SL do not change ever. DQPN and PSN seem to change between retries but DQPN incorrectly and PSN effectively unnecessarily (because QP is changed anyway).

Correct DLID would be equal to SLID (79).

As for the logs. Client gets error CQE and retries again and again. This is understandable from capture entries - packets are simply unroutable. This looks like addresses are populated incorrectly.

I saw articles mentioning people running Ceph on RDMA so I thought it was in a working state (maybe with some bugs to crush) but this looks like either the build I use (12.2.2) is broken or my environment is unsupported (for some unknown reason because it is rather standard stuff).

#5 Updated by Haomai Wang over 2 years ago

actually I always tested with connect3-x with ib mode.

from log, it seemed client have a good handshake with mon. but when client post write request, monitor doesn't receive any message. so client wait timeout then close connection.

what's your ceph.conf?

#6 Updated by Radosław Piliszek over 2 years ago

At the moment the ceph.conf is:

mon host =
fsid = 6de5466c-d11f-4e59-857b-b11cb0bc4d9b
public network =
cephx require signatures = true
ms type = async+rdma
ms_async_rdma_device_name = mlx4_0
ms_async_rdma_polling_us = 0
debug ms = 20/20

mon allow pool delete = true
osd pool default size = 2
osd pool default min size = 2


[root@smp-016 ~]# ibdev2netdev
mlx4_0 port 1 ==> ib0 (Up)
[root@smp-016 ~]# ip -4 a s ib0
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP qlen 1024
    inet brd scope global ib0
       valid_lft forever preferred_lft forever

#7 Updated by Jay Munsterman over 2 years ago

Just adding to the conversation: We appear to be experiencing the same thing here with the same configuration...

Also available in: Atom PDF