Project

General

Profile

Bug #39238

v14.2.0: RDMA/iWAPR/X722 segmentation fault: Infiniband Device failed to query rdma device.

Added by Changcheng Liu about 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
AsyncMessenger
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1. Before this problem happened, it could successfully query rdma device for several times.

2. This problem could not be reproduced at below commit point.
git reset --hard dd216198b5b30;
git am 0001-msg-async-rdma-add-RDMA-iWARP-protocol-support.patch
git am 0002-msg-async-rdma-support-qp-that-isn-t-associated-with.patch
git am 0003-msg-async-rdma-cmake-find_package-for-librdmacm.patch
git am 0004-cmake-consolidate-WITH_-VERBS-RDMACM.patch

3. This problem could be reproduced at below commit point.
git reset --hard d7692a24c74b5;
git am 0001-msg-async-rdma-add-RDMA-iWARP-protocol-support.patch
git am 0002-msg-async-rdma-support-qp-that-isn-t-associated-with.patch
git am 0003-msg-async-rdma-cmake-find_package-for-librdmacm.patch
git am 0004-cmake-consolidate-WITH_-VERBS-RDMACM.patch

4. The only PR merged from dd216198b5b30 to d7692a24c74b5 is:
https://github.com/ceph/ceph/pull/20172

5. This problem could also be reproduced on tag/v14.2.0 & master branch.

How to enable Ceph/RDMA/iWARP/X722 in ceph?
1. ceph configuration (The ms_async_rdam_device_name may change)
;tell the ceph use the AsyncMessenger + RDMA as your message type
ms_type = async+rdma
ms_async_rdma_device_name = i40iw1
ms_async_rdma_type = iwarp
ms_async_rdma_support_srq = false
ms_async_rdma_cm = true
;; ms_async_rdma_port_num = 1
; ms_async_rdma_send_buffers = 1024
; ms_async_rdma_receive_buffers = 16384
; ms_async_rdma_receive_queue_len = 1024
;; ms_async_rdma_buffer_size = 4096

2. vstart command
OSD=1 MON=1 MDS=0 RGW=0 MGR=0 bash -xev ../src/vstart.sh -n -d -X
(On v14.2.0, the command should be: OSD=1 MON=1 MDS=0 RGW=0 MGR=0 bash -xev ../src/vstart.sh -n -d -X --msgr1)

3. set ms_async_rdam_device_name in ceph.conf
nstcc1@nstcloudcc1:build$ ibv_devinfo | grep 'PORT_ACTIVE' -B 15 | grep 'hca_id'
hca_id: i40iw1

How to reproduce this problem with v14.2.0 as show in the attached log rdma_iwarp.log_clean?
1. plug X722 NIC on ubuntu 18.04.1 server.
2. git checkout -b v14_2_0 tags/v14.2.0
3. git am below patches
0001-abort-rdma-operation-when-ibv_query_device-failed.patch
0002-release-member-hold-memory-when-destructing-obj.patch
0003-check-allocated-memory-immediately-before-using-it.patch
0004-release-device_attr-space-when-destroying-Device-obj.patch
4. apply ceph configuration as show in above steps
5. run vstart.sh command:
OSD=1 MON=1 MDS=0 RGW=0 MGR=0 bash -xev ../src/vstart.sh -n -d -X --msgr1

0001-abort-rdma-operation-when-ibv_query_device-failed.patch View (1.07 KB) Changcheng Liu, 04/11/2019 09:29 AM

0001-msg-async-rdma-add-RDMA-iWARP-protocol-support.patch View (29.5 KB) Changcheng Liu, 04/11/2019 09:29 AM

0002-msg-async-rdma-support-qp-that-isn-t-associated-with.patch View (14.8 KB) Changcheng Liu, 04/11/2019 09:29 AM

0002-release-member-hold-memory-when-destructing-obj.patch View (937 Bytes) Changcheng Liu, 04/11/2019 09:29 AM

0003-check-allocated-memory-immediately-before-using-it.patch View (1.09 KB) Changcheng Liu, 04/11/2019 09:29 AM

0003-msg-async-rdma-cmake-find_package-for-librdmacm.patch View (7.17 KB) Changcheng Liu, 04/11/2019 09:29 AM

0004-cmake-consolidate-WITH_-VERBS-RDMACM.patch View (1.27 KB) Changcheng Liu, 04/11/2019 09:29 AM

0004-release-device_attr-space-when-destroying-Device-obj.patch View (871 Bytes) Changcheng Liu, 04/11/2019 09:29 AM

rdma_iwarp.zip (166 KB) Changcheng Liu, 04/11/2019 09:29 AM

History

#1 Updated by Changcheng Liu about 2 years ago

19-04-11 16:59:20.375672 0> 2019-04-11 16:59:20.336 7fa4c1490700 -1 ** Caught signal (Aborted) *
19-04-11 16:59:20.375693 in thread 7fa4c1490700 thread_name:msgr-worker-0
19-04-11 16:59:20.375712
19-04-11 16:59:20.375731 ceph version 14.2.0-4-g3bba7080b1 (3bba7080b177e802df72b8c07ba867ddfd5ac07b) nautilus (stable)
19-04-11 16:59:20.375763 1: (()+0x2bf2ca0) [0x559b7fb6eca0]
19-04-11 16:59:20.375785 2: (()+0x12890) [0x7fa4c4b27890]
19-04-11 16:59:20.375804 3: (gsignal()+0xc7) [0x7fa4c37d9e97]
19-04-11 16:59:20.375823 4: (abort()+0x141) [0x7fa4c37db801]
19-04-11 16:59:20.375842 5: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x377) [0x559b7fbda139]
19-04-11 16:59:20.375862 6: (Device::Device(CephContext*, ibv_device*, ibv_context*)+0x528) [0x559b8019bf80]
19-04-11 16:59:20.375881 7: (DeviceList::DeviceList(CephContext*)+0x296) [0x559b801a4cb4]
19-04-11 16:59:20.375900 8: (Infiniband::init()+0x80) [0x559b801a1e32]
19-04-11 16:59:20.375920 9: (RDMAWorker::listen(entity_addr_t&, unsigned int, SocketOptions const&, ServerSocket*)+0x4b) [0x559b7fe6ccfd]
19-04-11 16:59:20.375969 10: (()+0x2ec52fa) [0x559b7fe412fa]
19-04-11 16:59:20.375998 11: (()+0x2ecfeb4) [0x559b7fe4beb4]
19-04-11 16:59:20.376017 12: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xae9) [0x559b7fe57759]
19-04-11 16:59:20.376035 13: (()+0x2ee3376) [0x559b7fe5f376]
19-04-11 16:59:20.376054 14: (()+0x2ee4849) [0x559b7fe60849]
19-04-11 16:59:20.376073 15: (std::function<void ()>::operator()() const+0x32) [0x559b7f1f709e]
19-04-11 16:59:20.376093 16: (void std::__invoke_impl<void, std::function<void ()>>(std::__invoke_other, std::function<void ()>&&)+0x20) [0x559b7fe7253d]
19-04-11 16:59:20.376112 17: (std::__invoke_result<std::function<void ()>>::type std::__invoke<std::function<void ()>>(std::function<void ()>&&)+0x26) [0x559b7fe6ff0c]
19-04-11 16:59:20.376135 18: (decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<std::function<void ()> > >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x28) [0x559b7fe77c4c]
19-04-11 16:59:20.376155 19: (std::thread::_Invoker<std::tuple<std::function<void ()> > >::operator()()+0x1d) [0x559b7fe77bf9]
19-04-11 16:59:20.376175 20: (std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void ()> > > >::_M_run()+0x1c) [0x559b7fe77bb8]
19-04-11 16:59:20.376193 21: (()+0xbd57f) [0x7fa4c41ff57f]
19-04-11 16:59:20.376212 22: (()+0x76db) [0x7fa4c4b1c6db]
19-04-11 16:59:20.376232 23: (clone()+0x3f) [0x7fa4c38bc88f]
19-04-11 16:59:20.376251 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#2 Updated by Changcheng Liu about 2 years ago

When this problem is hit, ib_safe_file_access(filp) return false:

static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, 
                 size_t count, loff_t *pos) 
{
+---  9 lines: struct ib_uverbs_file *file = filp->private_data;-------------------------------------
    if (!ib_safe_file_access(filp)) {
        pr_err_once("uverbs_write: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
                task_tgid_vnr(current), current->comm);
        *return -EACCES;*
    }
+-- 74 lines: if (count < sizeof(hdr))---------------------------------------------------------------
}

static inline bool ib_safe_file_access(struct file *filp)
{   
        return filp->f_cred == current_cred() && !uaccess_kernel(); //when problem is hit, *filp->f_cred is not equal to current_cred()*
}

The above info is got by tracing live kernel with kprobe events:
# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   |||| 
           <...>-87018 [003] .... 15409.847504: rdma_verb_fs: (ib_uverbs_write+0x3c/0x3d0 [ib_uverbs]) filp_f_cred=0xffff8906bd855b00 current_cred=0xffff8906ad773500 get_fs=0xffffffffffffffff
           <...>-87018 [003] d... 15409.847510: rdma_ib_verb: (__vfs_write+0x1b/0x40 <- ib_uverbs_write) ret=0xfffffffffffffff3 t_name="msgr-worker-0" 

#4 Updated by Changcheng Liu almost 2 years ago

1. The merged PR https://github.com/ceph/ceph/pull/28012 solves one case which could trigger this problem.

2. There's another case could trigger this problem as shown in the previous analysis. To solve it:
a. modify rdam/iWARP rdma-core-library/rdma-kernel
OR
b. Refine commit https://github.com/ceph/ceph/commit/fdde016301ae329f76c621337c384ac60aa0d210
This commit includes below change which triggers the segmentation happens under rdma/iWARP protocol:

  if (!iparams.no_mon_config) {
    MonClient mc_bootstrap(g_ceph_context);
    if (mc_bootstrap.get_monmap_and_config() < 0) {
      derr << "failed to fetch mon config (--no-mon-config to skip)" << dendl;
      cct->_log->flush();
      _exit(1);
    }
  }

Also available in: Atom PDF