Project

General

Profile

Bug #36624

Updated by Kefu Chai over 5 years ago

When started Ceph cluster enabling SPDK with 64KB kernel page size, observed assert in bluestore/NVMEDevice.cc as below: 

 <pre> 
 Starting SPDK v18.04.1 / DPDK 18.05.0 initialization... 
 [ DPDK EAL parameters: nvme-device-manager -c 0x1 -m 2048 --file-prefix=spdk_pid20837 ] 
 EAL: Detected 46 lcore(s) 
 EAL: Detected 1 NUMA nodes 
 EAL: Multi-process socket /var/run/dpdk/spdk_pid20837/mp_socket 
 EAL: Probing VFIO support... 
 EAL: VFIO support initialized 
 Unable to unlink shared memory file: /var/run/.spdk_pid20837_hugepage_info. Error code: 2 
 EAL: PCI device 0000:01:00.0 on NUMA socket 0 
 EAL:     probe driver: 8086:953 spdk_nvme 
 EAL:     using IOMMU type 1 (Type 1) 
 /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: In function 'virtual int NVMEDevice::write(uint64_t, ceph::bufferlist&, bool)' thread ffff81c7adf0 time 2018-10-20 09:22:26.229229 
 /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: 844: FAILED ceph_assert(off % block_size == 0) 
  ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev) 
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0xaaaab5c4a8ac] 
  2: (()+0x2a0aab0) [0xaaaab5c4aab0] 
  3: (NVMEDevice::write(unsigned long, ceph::buffer::list&, bool)+0x1b8) [0xaaaab5bbc6b4] 
  4: (BlueFS::_write_super()+0x39c) [0xaaaab5b61768] 
  5: (BlueFS::mkfs(uuid_d)+0x590) [0xaaaab5b5fcd8] 
  6: (BlueStore::_open_db(bool, bool)+0x1c24) [0xaaaab59db724] 
  7: (BlueStore::mkfs()+0x116c) [0xaaaab59e2c2c] 
  8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x94) [0xaaaab520d514] 
  9: (main()+0x1650) [0xaaaab51dad34] 
  10: (__libc_start_main()+0xe0) [0xffff812f06e0] 
  11: (()+0x1f984e4) [0xaaaab51d84e4] 
 *** Caught signal (Aborted) ** 
  in thread ffff81c7adf0 thread_name:ceph-osd 
 2018-10-30 09:22:26.242 ffff81c7adf0 -1 /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: In function 'virtual int NVMEDevice::write(uint64_t, ceph::bufferlist&, bool)' thread ffff81c7adf0 time 2018-10-30 09:22:26.229229 
 /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: 844: FAILED ceph_assert(off % block_size == 0) 

  ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev) 
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0xaaaab5c4a8ac] 
  2: (()+0x2a0aab0) [0xaaaab5c4aab0] 
  3: (NVMEDevice::write(unsigned long, ceph::buffer::list&, bool)+0x1b8) [0xaaaab5bbc6b4] 
  4: (BlueFS::_write_super()+0x39c) [0xaaaab5b61768] 
  5: (BlueFS::mkfs(uuid_d)+0x590) [0xaaaab5b5fcd8] 
  6: (BlueStore::_open_db(bool, bool)+0x1c24) [0xaaaab59db724] 
  7: (BlueStore::mkfs()+0x116c) [0xaaaab59e2c2c] 
  8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x94) [0xaaaab520d514] 
  9: (main()+0x1650) [0xaaaab51dad34] 
  10: (__libc_start_main()+0xe0) [0xffff812f06e0] 
  11: (()+0x1f984e4) [0xaaaab51d84e4] 

  ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev) 
  1: (()+0x2990938) [0xaaaab5bd0938] 
  2: (__kernel_rt_sigreturn()+0) [0xffff81df066c] 
  3: (raise()+0xb0) [0xffff813024d8] 
 2018-10-30 09:22:26.242 ffff81c7adf0 -1 *** Caught signal (Aborted) ** 
  in thread ffff81c7adf0 thread_name:ceph-osd 

  ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev) 
  1: (()+0x2990938) [0xaaaab5bd0938] 
  2: (__kernel_rt_sigreturn()+0) [0xffff81df066c] 
  3: (raise()+0xb0) [0xffff813024d8] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 
 </pre> 

 Observed the same assert after switch to version master/LATEST. 

Back