Project

General

Profile

Bug #44702

Double destroy_qp causes segmentation fault

Added by chunsong feng 11 days ago. Updated 11 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

-4> 2020-03-21T16:17:32.996+0800 ffff93c649f0 -1 osd.26 9142 set_numa_affinity unable to identify public interface 'rocevlan' numa node: (2) No such file or directory
-3> 2020-03-21T16:17:33.052+0800 ffff8cc569f0 5 prioritycache tune_memory target: 4294967296 mapped: 5078843392 unmapped: 9240576 heap: 5088083968 old mem: 134217728 new mem: 134217728
-2> 2020-03-21T16:17:34.052+0800 ffff8cc569f0 5 prioritycache tune_memory target: 4294967296 mapped: 5080784896 unmapped: 7299072 heap: 5088083968 old mem: 134217728 new mem: 134217728
-1> 2020-03-21T16:17:34.056+0800 ffff9b8de9f0 -1 Infiniband modify_qp_to_init failed to switch to INIT state Queue Pair, qp number: 985228 Error: (5) Input/output error
0> 2020-03-21T16:17:34.160+0800 ffff9b8de9f0 -1 ** Caught signal (Segmentation fault) *
in thread ffff9b8de9f0 thread_name:msgr-worker-0

ceph version 15.1.0-35-gdeba62656d (deba62656d6bc55b66cb67ef83759f89a51eff9f) octopus (rc)
1: (__kernel_rt_sigreturn()+0) [0xffff9cc0a5c0]
2: (ibv_destroy_qp()+0x8) [0xffff9c6d4fc0]
3: (Infiniband::QueuePair::~QueuePair()+0x48) [0xaaaac2212058]
4: (Infiniband::create_queue_pair(CephContext*, RDMAWorker*, ibv_qp_type, rdma_cm_id*)+0x8c) [0xaaaac22125e4]
5: (RDMAConnectedSocketImpl::RDMAConnectedSocketImpl(CephContext*, std::shared_ptr<Infiniband>&, std::shared_ptr<RDMADispatcher>&, RDMAWorker*)+0x188) [0xaaaac2212f20]
6: (RDMAWorker::connect(entity_addr_t const&, SocketOptions const&, ConnectedSocket*)+0x10c) [0xaaaac201bc04]
7: (AsyncConnection::process()+0x554) [0xaaaac21ba3fc]
8: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xbb0) [0xaaaac200ad50]
9: (()+0x123b8d0) [0xaaaac20108d0]
10: (()+0xc9ed4) [0xffff9c561ed4]
11: (()+0x7088) [0xffff9c71d088]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

the first ibv_destroy_qp is called just after modify_qp_to_init
if (modify_qp_to_init() != 0) {
ibv_destroy_qp(qp);

return -1;
}

the second ibv_destroy_qp is called in ~QueuePair.
So we should add qp = NULL; after the first ibv_destroy_qp is called .

qp_double_destroy.txt View (196 KB) chunsong feng, 03/21/2020 08:48 AM

History

#1 Updated by chunsong feng 11 days ago

Also available in: Atom PDF