Bug #36624
Updated by Kefu Chai over 5 years ago
When started Ceph cluster enabling SPDK with 64KB kernel page size, observed assert in bluestore/NVMEDevice.cc as below: <pre> Starting SPDK v18.04.1 / DPDK 18.05.0 initialization... [ DPDK EAL parameters: nvme-device-manager -c 0x1 -m 2048 --file-prefix=spdk_pid20837 ] EAL: Detected 46 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/spdk_pid20837/mp_socket EAL: Probing VFIO support... EAL: VFIO support initialized Unable to unlink shared memory file: /var/run/.spdk_pid20837_hugepage_info. Error code: 2 EAL: PCI device 0000:01:00.0 on NUMA socket 0 EAL: probe driver: 8086:953 spdk_nvme EAL: using IOMMU type 1 (Type 1) /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: In function 'virtual int NVMEDevice::write(uint64_t, ceph::bufferlist&, bool)' thread ffff81c7adf0 time 2018-10-20 09:22:26.229229 /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: 844: FAILED ceph_assert(off % block_size == 0) ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0xaaaab5c4a8ac] 2: (()+0x2a0aab0) [0xaaaab5c4aab0] 3: (NVMEDevice::write(unsigned long, ceph::buffer::list&, bool)+0x1b8) [0xaaaab5bbc6b4] 4: (BlueFS::_write_super()+0x39c) [0xaaaab5b61768] 5: (BlueFS::mkfs(uuid_d)+0x590) [0xaaaab5b5fcd8] 6: (BlueStore::_open_db(bool, bool)+0x1c24) [0xaaaab59db724] 7: (BlueStore::mkfs()+0x116c) [0xaaaab59e2c2c] 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x94) [0xaaaab520d514] 9: (main()+0x1650) [0xaaaab51dad34] 10: (__libc_start_main()+0xe0) [0xffff812f06e0] 11: (()+0x1f984e4) [0xaaaab51d84e4] *** Caught signal (Aborted) ** in thread ffff81c7adf0 thread_name:ceph-osd 2018-10-30 09:22:26.242 ffff81c7adf0 -1 /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: In function 'virtual int NVMEDevice::write(uint64_t, ceph::bufferlist&, bool)' thread ffff81c7adf0 time 2018-10-30 09:22:26.229229 /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: 844: FAILED ceph_assert(off % block_size == 0) ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0xaaaab5c4a8ac] 2: (()+0x2a0aab0) [0xaaaab5c4aab0] 3: (NVMEDevice::write(unsigned long, ceph::buffer::list&, bool)+0x1b8) [0xaaaab5bbc6b4] 4: (BlueFS::_write_super()+0x39c) [0xaaaab5b61768] 5: (BlueFS::mkfs(uuid_d)+0x590) [0xaaaab5b5fcd8] 6: (BlueStore::_open_db(bool, bool)+0x1c24) [0xaaaab59db724] 7: (BlueStore::mkfs()+0x116c) [0xaaaab59e2c2c] 8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x94) [0xaaaab520d514] 9: (main()+0x1650) [0xaaaab51dad34] 10: (__libc_start_main()+0xe0) [0xffff812f06e0] 11: (()+0x1f984e4) [0xaaaab51d84e4] ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev) 1: (()+0x2990938) [0xaaaab5bd0938] 2: (__kernel_rt_sigreturn()+0) [0xffff81df066c] 3: (raise()+0xb0) [0xffff813024d8] 2018-10-30 09:22:26.242 ffff81c7adf0 -1 *** Caught signal (Aborted) ** in thread ffff81c7adf0 thread_name:ceph-osd ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev) 1: (()+0x2990938) [0xaaaab5bd0938] 2: (__kernel_rt_sigreturn()+0) [0xffff81df066c] 3: (raise()+0xb0) [0xffff813024d8] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. </pre> Observed the same assert after switch to version master/LATEST.