Project

General

Profile

Bug #38056

RDMA has an error in handle_connection func

Added by yuanli zhu about 5 years ago. Updated 7 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

where i use RDMA to deploy ceph cluster,there is an error where handle_connection fun is running.

my ceph version is ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)

where is run ceph -s:

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.8/rpm/el7/BUILD/ceph-12.2.8/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In function 'void RDMAConnectedSocketImpl::handle_connection()' thread 7fd0a5840700 time 2019-01-25 14:51:52.397273
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.8/rpm/el7/BUILD/ceph-12.2.8/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 221: FAILED assert(!r)
2019-01-25 14:51:52.397212 7fd0a5840700 -1 RDMAConnectedSocketImpl activate failed to transition to RTR state: (22) Invalid argument
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fd0aed78a30]
2: (RDMAConnectedSocketImpl::handle_connection()+0xb1b) [0x7fd0aef1759b]
3: (EventCenter::process_events(int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x359) [0x7fd0aeefe3d9]
4: (()+0x455b0e) [0x7fd0aef01b0e]
5: (()+0xb5070) [0x7fd0acdaf070]
6: (()+0x7e25) [0x7fd0c0371e25]
7: (clone()+0x6d) [0x7fd0bf992bad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aborted

My question is whether there is a problem with my configuration or the way IB and RDMA are used?

ceph.conf View (370 Bytes) yuanli zhu, 01/28/2019 01:43 AM

hosts.log View (227 Bytes) yuanli zhu, 01/28/2019 01:43 AM

ifconfig.txt View (2.22 KB) yuanli zhu, 01/28/2019 01:43 AM

show_gids.log View (157 Bytes) yuanli zhu, 01/28/2019 01:43 AM

ceph-mon.node1.log View (70.7 KB) yuanli zhu, 01/28/2019 01:43 AM

History

#1 Updated by yuanli zhu about 5 years ago

ddd

#2 Updated by Darren Wen 7 months ago

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In function 'void RDMAConnectedSocketImpl::handle_connection()' thread 7fe22b7fe700 time 2023-08-19 16:10:17.802834
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 220: FAILED assert(!r)
2023-08-19 16:10:17.802784 7fe22b7fe700 -1  RDMAConnectedSocketImpl activate failed to transition to RTR state: (61) No data available
 ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fe238eb86b0]
 2: (RDMAConnectedSocketImpl::handle_connection()+0xb1b) [0x7fe23905c08b]
 3: (EventCenter::process_events(int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x359) [0x7fe239043499]
 4: (()+0x42cbce) [0x7fe239046bce]
 5: (()+0xb5330) [0x7fe236ef1330]
 6: (()+0x7ea5) [0x7fe24a6eeea5]
 7: (clone()+0x6d) [0x7fe249d0eb0d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I have the same problem, Is there a solution to the problem?

Also available in: Atom PDF