Project

General

Profile

Bug #36624

Updated by Kefu Chai over 1 year ago

When started Ceph cluster enabling SPDK with 64KB kernel page size, observed assert in bluestore/NVMEDevice.cc as below:

<pre>
Starting SPDK v18.04.1 / DPDK 18.05.0 initialization...
[ DPDK EAL parameters: nvme-device-manager -c 0x1 -m 2048 --file-prefix=spdk_pid20837 ]
EAL: Detected 46 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/spdk_pid20837/mp_socket
EAL: Probing VFIO support...
EAL: VFIO support initialized
Unable to unlink shared memory file: /var/run/.spdk_pid20837_hugepage_info. Error code: 2
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
EAL: using IOMMU type 1 (Type 1)
/home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: In function 'virtual int NVMEDevice::write(uint64_t, ceph::bufferlist&, bool)' thread ffff81c7adf0 time 2018-10-20 09:22:26.229229
/home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: 844: FAILED ceph_assert(off % block_size == 0)
ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0xaaaab5c4a8ac]
2: (()+0x2a0aab0) [0xaaaab5c4aab0]
3: (NVMEDevice::write(unsigned long, ceph::buffer::list&, bool)+0x1b8) [0xaaaab5bbc6b4]
4: (BlueFS::_write_super()+0x39c) [0xaaaab5b61768]
5: (BlueFS::mkfs(uuid_d)+0x590) [0xaaaab5b5fcd8]
6: (BlueStore::_open_db(bool, bool)+0x1c24) [0xaaaab59db724]
7: (BlueStore::mkfs()+0x116c) [0xaaaab59e2c2c]
8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x94) [0xaaaab520d514]
9: (main()+0x1650) [0xaaaab51dad34]
10: (__libc_start_main()+0xe0) [0xffff812f06e0]
11: (()+0x1f984e4) [0xaaaab51d84e4]
*** Caught signal (Aborted) **
in thread ffff81c7adf0 thread_name:ceph-osd
2018-10-30 09:22:26.242 ffff81c7adf0 -1 /home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: In function 'virtual int NVMEDevice::write(uint64_t, ceph::bufferlist&, bool)' thread ffff81c7adf0 time 2018-10-30 09:22:26.229229
/home/ubuntu/ceph/src/os/bluestore/NVMEDevice.cc: 844: FAILED ceph_assert(off % block_size == 0)

ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0xaaaab5c4a8ac]
2: (()+0x2a0aab0) [0xaaaab5c4aab0]
3: (NVMEDevice::write(unsigned long, ceph::buffer::list&, bool)+0x1b8) [0xaaaab5bbc6b4]
4: (BlueFS::_write_super()+0x39c) [0xaaaab5b61768]
5: (BlueFS::mkfs(uuid_d)+0x590) [0xaaaab5b5fcd8]
6: (BlueStore::_open_db(bool, bool)+0x1c24) [0xaaaab59db724]
7: (BlueStore::mkfs()+0x116c) [0xaaaab59e2c2c]
8: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x94) [0xaaaab520d514]
9: (main()+0x1650) [0xaaaab51dad34]
10: (__libc_start_main()+0xe0) [0xffff812f06e0]
11: (()+0x1f984e4) [0xaaaab51d84e4]

ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev)
1: (()+0x2990938) [0xaaaab5bd0938]
2: (__kernel_rt_sigreturn()+0) [0xffff81df066c]
3: (raise()+0xb0) [0xffff813024d8]
2018-10-30 09:22:26.242 ffff81c7adf0 -1 *** Caught signal (Aborted) **
in thread ffff81c7adf0 thread_name:ceph-osd

ceph version 14.0.0-4420-g98fc7ebc99 (98fc7ebc99a3639240eef4f745c9bd633446d2b3) nautilus (dev)
1: (()+0x2990938) [0xaaaab5bd0938]
2: (__kernel_rt_sigreturn()+0) [0xffff81df066c]
3: (raise()+0xb0) [0xffff813024d8]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre>


Observed the same assert after switch to version master/LATEST.

Back