Project

General

Profile

Bug #43183

Segmentation fault in tcmalloc when create osd

Added by chunsong feng 3 months ago. Updated 23 days ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy osd create node2 --data /dev/nvme1n1p12
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] bluestore : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0xffffaac9f518>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] block_wal : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] journal : None
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] host : node2
[ceph_deploy.cli][INFO ] filestore : None
[ceph_deploy.cli][INFO ] func : <function osd at 0xffffaad0ab90>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.cli][INFO ] data : /dev/nvme1n1p12
[ceph_deploy.cli][INFO ] block_db : None
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/nvme1n1p12
[node2][DEBUG ] connected to host: node2
[node2][DEBUG ] detect platform information from remote host
[node2][DEBUG ] detect machine type
[node2][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: Ubuntu 18.04 bionic
[ceph_deploy.osd][DEBUG ] Deploying osd to node2
[node2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[node2][DEBUG ] find the location of an executable
[node2][INFO ] Running command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/nvme1n1p12
[node2][WARNIN] Running command: /usr/bin/ceph-authtool --gen-print-key
[node2][WARNIN] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e94bd56b-4916-462a-9b20-d4eda296fc94
[node2][WARNIN] Running command: /sbin/vgcreate -s 1G --force --yes ceph-1a948ad1-12f6-44a8-af82-98bd0be0a1e4 /dev/nvme1n1p12
[node2][WARNIN] stdout: Volume group "ceph-1a948ad1-12f6-44a8-af82-98bd0be0a1e4" successfully created
[node2][WARNIN] Running command: /sbin/lvcreate --yes -l 100%FREE -n osd-block-e94bd56b-4916-462a-9b20-d4eda296fc94 ceph-1a948ad1-12f6-44a8-af82-98bd0be0a1e4
[node2][WARNIN] stdout: Logical volume "osd-block-e94bd56b-4916-462a-9b20-d4eda296fc94" created.
[node2][WARNIN] Running command: /usr/bin/ceph-authtool --gen-print-key
[node2][WARNIN] Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-45
[node2][WARNIN] Running command: /bin/chown -h ceph:ceph /dev/ceph-1a948ad1-12f6-44a8-af82-98bd0be0a1e4/osd-block-e94bd56b-4916-462a-9b20-d4eda296fc94
[node2][WARNIN] Running command: /bin/chown -R ceph:ceph /dev/dm-24
[node2][WARNIN] Running command: /bin/ln -s /dev/ceph-1a948ad1-12f6-44a8-af82-98bd0be0a1e4/osd-block-e94bd56b-4916-462a-9b20-d4eda296fc94 /var/lib/ceph/osd/ceph-45/block
[node2][WARNIN] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-45/activate.monmap
[node2][WARNIN] stderr: 2019-12-07T16:11:02.468+0800 ffff91cb01f0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
[node2][WARNIN] 2019-12-07T16:11:02.468+0800 ffff91cb01f0 -1 AuthRegistry(0xffff8c057918) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
[node2][WARNIN] stderr: got monmap epoch 1
[node2][WARNIN] Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-45/keyring --create-keyring --name osd.45 --add-key AQCSXutdJJFOEBAAMCig4GzlU2RV0jod1Q3dag==
[node2][WARNIN] stdout: creating /var/lib/ceph/osd/ceph-45/keyring
[node2][WARNIN] stdout: added entity osd.45 auth(key=AQCSXutdJJFOEBAAMCig4GzlU2RV0jod1Q3dag==)
[node2][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-45/keyring
[node2][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-45/
[node2][WARNIN] Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 45 --monmap /var/lib/ceph/osd/ceph-45/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-45/ --osd-uuid e94bd56b-4916-462a-9b20-d4eda296fc94 --setuser ceph --setgroup ceph
[node2][WARNIN] stdout: EAL: Probing VFIO support...
[node2][WARNIN] stdout: EAL: VFIO support initialized
[node2][WARNIN] stdout: EAL: PCI device 0000:7d:07.6 on NUMA socket 0
[node2][WARNIN] stdout: EAL: probe driver: 19e5:a22e net_hns3_vf
[node2][WARNIN] stdout: EAL: using IOMMU type 8 (No-IOMMU)
[node2][WARNIN] stderr: EAL: Detected 96 lcore(s)
[node2][WARNIN] stderr: EAL: Detected 4 NUMA nodes
[node2][WARNIN] stderr: EAL: No available hugepages reported in hugepages-32768kB
[node2][WARNIN] stderr: EAL: No available hugepages reported in hugepages-64kB
[node2][WARNIN] stderr: EAL: No available hugepages reported in hugepages-1048576kB
[node2][WARNIN] stderr: 2019-12-07T16:11:03.592+0800 ffffa37d4010 -1 dpdkstack ~DPDKStack destructing DPDKStack...
[node2][WARNIN] stderr: 2019-12-07T16:11:03.600+0800 ffffa37d4010 -1 bluestore(/var/lib/ceph/osd/ceph-45/) read_fsid unparsable uuid
[node2][WARNIN] stderr: * Caught signal (Segmentation fault) *
[node2][WARNIN] stderr: in thread ffffa37d4010 thread_name:ceph-osd
[node2][WARNIN] stderr: ceph version 15.0.0-8068-gef182dfd94 (ef182dfd9447058905d8d0b0833d037165fc19ff) octopus (dev)
[node2][WARNIN] stderr: 1: (
_kernel_rt_sigreturn()+0) [0xffffa422b5b8]
[node2][WARNIN] stderr: 2: (tcmalloc::CentralFreeList::FetchFromOneSpans(int, void
, void*)+0x58) [0xffffa4005950]
[node2][WARNIN] stderr: 3: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x24) [0xffffa4005c54]
[node2][WARNIN] stderr: 4: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x90) [0xffffa4005d20]
[node2][WARNIN] stderr: 5: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)+0x80) [0xffffa4008c10]
[node2][WARNIN] stderr: 6: (posix_memalign()+0x7c) [0xffffa40197d4]
[node2][WARNIN] stderr: 7: (ceph::buffer::v14_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int)+0xe8) [0xaaaad0b97e98]
[node2][WARNIN] stderr: 8: (ceph::buffer::v14_2_0::create_aligned(unsigned int, unsigned int)+0x2c) [0xaaaad0b980c4]
[node2][WARNIN] stderr: 9: (ceph::buffer::v14_2_0::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int, unsigned int)+0x1c0) [0xaaaad0b98da0]
[node2][WARNIN] stderr: 10: (KernelDevice::write(unsigned long, ceph::buffer::v14_2_0::list&, bool, int)+0x438) [0xaaaad09ba3c0]
[node2][WARNIN] stderr: 11: (BlueFS::_write_super(int)+0xe4) [0xaaaad09624c4]
[node2][WARNIN] stderr: 12: (BlueFS::mkfs(uuid_d, bluefs_layout_t const&)+0x8a0) [0xaaaad0980248]
[node2][WARNIN] stderr: 13: (BlueStore::_open_bluefs(bool)+0x274) [0xaaaad089bdcc]
[node2][WARNIN] stderr: 14: (BlueStore::_open_db(bool, bool, bool)+0x364) [0xaaaad089c214]
[node2][WARNIN] stderr: 15: (BlueStore::mkfs()+0xc88) [0xaaaad08d0700]
[node2][WARNIN] stderr: 16: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x130) [0xaaaad039d740]
[node2][WARNIN] stderr: 17: (main()+0x1370) [0xaaaad035c9a0]
[node2][WARNIN] stderr: 18: (_libc_start_main()+0xe0) [0xffffa38b06e0]
[node2][WARNIN] stderr: 19: (()+0xa23988) [0xaaaad0374988]
[node2][WARNIN] stderr: 2019-12-07T16:11:03.612+0800 ffffa37d4010 -1 * Caught signal (Segmentation fault) *
[node2][WARNIN] stderr: in thread ffffa37d4010 thread_name:ceph-osd
[node2][WARNIN] stderr: ceph version 15.0.0-8068-gef182dfd94 (ef182dfd9447058905d8d0b0833d037165fc19ff) octopus (dev)
[node2][WARNIN] stderr: 1: (
_kernel_rt_sigreturn()+0) [0xffffa422b5b8]
[node2][WARNIN] stderr: 2: (tcmalloc::CentralFreeList::FetchFromOneSpans(int, void
, void*)+0x58) [0xffffa4005950]
[node2][WARNIN] stderr: 3: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x24) [0xffffa4005c54]
[node2][WARNIN] stderr: 4: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x90) [0xffffa4005d20]
[node2][WARNIN] stderr: 5: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)+0x80) [0xffffa4008c10]
[node2][WARNIN] stderr: 6: (posix_memalign()+0x7c) [0xffffa40197d4]
[node2][WARNIN] stderr: 7: (ceph::buffer::v14_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int)+0xe8) [0xaaaad0b97e98]
[node2][WARNIN] stderr: 8: (ceph::buffer::v14_2_0::create_aligned(unsigned int, unsigned int)+0x2c) [0xaaaad0b980c4]
[node2][WARNIN] stderr: 9: (ceph::buffer::v14_2_0::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int, unsigned int)+0x1c0) [0xaaaad0b98da0]
[node2][WARNIN] stderr: 10: (KernelDevice::write(unsigned long, ceph::buffer::v14_2_0::list&, bool, int)+0x438) [0xaaaad09ba3c0]
[node2][WARNIN] stderr: 11: (BlueFS::_write_super(int)+0xe4) [0xaaaad09624c4]
[node2][WARNIN] stderr: 12: (BlueFS::mkfs(uuid_d, bluefs_layout_t const&)+0x8a0) [0xaaaad0980248]
[node2][WARNIN] stderr: 13: (BlueStore::_open_bluefs(bool)+0x274) [0xaaaad089bdcc]
[node2][WARNIN] stderr: 14: (BlueStore::_open_db(bool, bool, bool)+0x364) [0xaaaad089c214]
[node2][WARNIN] stderr: 15: (BlueStore::mkfs()+0xc88) [0xaaaad08d0700]
[node2][WARNIN] stderr: 16: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x130) [0xaaaad039d740]
[node2][WARNIN] stderr: 17: (main()+0x1370) [0xaaaad035c9a0]
[node2][WARNIN] stderr: 18: (_libc_start_main()+0xe0) [0xffffa38b06e0]
[node2][WARNIN] stderr: 19: (()+0xa23988) [0xaaaad0374988]
[node2][WARNIN] stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[node2][WARNIN] stderr: -33> 2019-12-07T16:11:03.592+0800 ffffa37d4010 -1 dpdkstack ~DPDKStack destructing DPDKStack...
[node2][WARNIN] stderr: -18> 2019-12-07T16:11:03.600+0800 ffffa37d4010 -1 bluestore(/var/lib/ceph/osd/ceph-45/) _read_fsid unparsable uuid
[node2][WARNIN] stderr: 0> 2019-12-07T16:11:03.612+0800 ffffa37d4010 -1 * Caught signal (Segmentation fault) *
[node2][WARNIN] stderr: in thread ffffa37d4010 thread_name:ceph-osd
[node2][WARNIN] stderr: ceph version 15.0.0-8068-gef182dfd94 (ef182dfd9447058905d8d0b0833d037165fc19ff) octopus (dev)
[node2][WARNIN] stderr: 1: (
_kernel_rt_sigreturn()+0) [0xffffa422b5b8]
[node2][WARNIN] stderr: 2: (tcmalloc::CentralFreeList::FetchFromOneSpans(int, void
, void*)+0x58) [0xffffa4005950]
[node2][WARNIN] stderr: 3: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x24) [0xffffa4005c54]
[node2][WARNIN] stderr: 4: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x90) [0xffffa4005d20]
[node2][WARNIN] stderr: 5: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)+0x80) [0xffffa4008c10]
[node2][WARNIN] stderr: 6: (posix_memalign()+0x7c) [0xffffa40197d4]
[node2][WARNIN] stderr: 7: (ceph::buffer::v14_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int)+0xe8) [0xaaaad0b97e98]
[node2][WARNIN] stderr: 8: (ceph::buffer::v14_2_0::create_aligned(unsigned int, unsigned int)+0x2c) [0xaaaad0b980c4]
[node2][WARNIN] stderr: 9: (ceph::buffer::v14_2_0::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int, unsigned int)+0x1c0) [0xaaaad0b98da0]
[node2][WARNIN] stderr: 10: (KernelDevice::write(unsigned long, ceph::buffer::v14_2_0::list&, bool, int)+0x438) [0xaaaad09ba3c0]
[node2][WARNIN] stderr: 11: (BlueFS::_write_super(int)+0xe4) [0xaaaad09624c4]
[node2][WARNIN] stderr: 12: (BlueFS::mkfs(uuid_d, bluefs_layout_t const&)+0x8a0) [0xaaaad0980248]
[node2][WARNIN] stderr: 13: (BlueStore::_open_bluefs(bool)+0x274) [0xaaaad089bdcc]
[node2][WARNIN] stderr: 14: (BlueStore::_open_db(bool, bool, bool)+0x364) [0xaaaad089c214]
[node2][WARNIN] stderr: 15: (BlueStore::mkfs()+0xc88) [0xaaaad08d0700]
[node2][WARNIN] stderr: 16: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x130) [0xaaaad039d740]
[node2][WARNIN] stderr: 17: (main()+0x1370) [0xaaaad035c9a0]
[node2][WARNIN] stderr: 18: (_libc_start_main()+0xe0) [0xffffa38b06e0]
[node2][WARNIN] stderr: 19: (()+0xa23988) [0xaaaad0374988]
[node2][WARNIN] stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[node2][WARNIN] stderr: -33> 2019-12-07T16:11:03.592+0800 ffffa37d4010 -1 dpdkstack ~DPDKStack destructing DPDKStack...
[node2][WARNIN] stderr: -18> 2019-12-07T16:11:03.600+0800 ffffa37d4010 -1 bluestore(/var/lib/ceph/osd/ceph-45/) _read_fsid unparsable uuid
[node2][WARNIN] stderr: 0> 2019-12-07T16:11:03.612+0800 ffffa37d4010 -1 * Caught signal (Segmentation fault) *
[node2][WARNIN] stderr: in thread ffffa37d4010 thread_name:ceph-osd
[node2][WARNIN] stderr: ceph version 15.0.0-8068-gef182dfd94 (ef182dfd9447058905d8d0b0833d037165fc19ff) octopus (dev)
[node2][WARNIN] stderr: 1: (
_kernel_rt_sigreturn()+0) [0xffffa422b5b8]
[node2][WARNIN] stderr: 2: (tcmalloc::CentralFreeList::FetchFromOneSpans(int, void
, void*)+0x58) [0xffffa4005950]
[node2][WARNIN] stderr: 3: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x24) [0xffffa4005c54]
[node2][WARNIN] stderr: 4: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x90) [0xffffa4005d20]
[node2][WARNIN] stderr: 5: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)+0x80) [0xffffa4008c10]
[node2][WARNIN] stderr: 6: (posix_memalign()+0x7c) [0xffffa40197d4]
[node2][WARNIN] stderr: 7: (ceph::buffer::v14_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int)+0xe8) [0xaaaad0b97e98]
[node2][WARNIN] stderr: 8: (ceph::buffer::v14_2_0::create_aligned(unsigned int, unsigned int)+0x2c) [0xaaaad0b980c4]
[node2][WARNIN] stderr: 9: (ceph::buffer::v14_2_0::list::rebuild_aligned_size_and_memory(unsigned int, unsigned int, unsigned int)+0x1c0) [0xaaaad0b98da0]
[node2][WARNIN] stderr: 10: (KernelDevice::write(unsigned long, ceph::buffer::v14_2_0::list&, bool, int)+0x438) [0xaaaad09ba3c0]
[node2][WARNIN] stderr: 11: (BlueFS::_write_super(int)+0xe4) [0xaaaad09624c4]
[node2][WARNIN] stderr: 12: (BlueFS::mkfs(uuid_d, bluefs_layout_t const&)+0x8a0) [0xaaaad0980248]
[node2][WARNIN] stderr: 13: (BlueStore::_open_bluefs(bool)+0x274) [0xaaaad089bdcc]
[node2][WARNIN] stderr: 14: (BlueStore::_open_db(bool, bool, bool)+0x364) [0xaaaad089c214]
[node2][WARNIN] stderr: 15: (BlueStore::mkfs()+0xc88) [0xaaaad08d0700]
[node2][WARNIN] stderr: 16: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0x130) [0xaaaad039d740]
[node2][WARNIN] stderr: 17: (main()+0x1370) [0xaaaad035c9a0]
[node2][WARNIN] stderr: 18: (__libc_start_main()+0xe0) [0xffffa38b06e0]
[node2][WARNIN] stderr: 19: (()+0xa23988) [0xaaaad0374988]
[node2][WARNIN] stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[node2][WARNIN] --> Was unable to complete a new OSD, will rollback changes
[node2][WARNIN] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.45 --yes-i-really-mean-it
[node2][WARNIN] stderr: 2019-12-07T16:11:04.124+0800 ffffa9cb71f0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
[node2][WARNIN] 2019-12-07T16:11:04.124+0800 ffffa9cb71f0 -1 AuthRegistry(0xffffa4057918) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
[node2][WARNIN] stderr: purged osd.45
[node2][WARNIN] --> RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 45 --monmap /var/lib/ceph/osd/ceph-45/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-45/ --osd-uuid e94bd56b-4916-462a-9b20-d4eda296fc94 --setuser ceph --setgroup ceph
[node2][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/nvme1n1p12
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs

tcmalloc fail.txt View (22.2 KB) chunsong feng, 12/07/2019 08:18 AM

History

#1 Updated by Greg Farnum 2 months ago

  • Project changed from RADOS to bluestore
  • Category deleted (Tests)

#2 Updated by Igor Fedotov 2 months ago

Looks like SPDK/DPDK mode is enabled. Suggest to turn this off as a workaround.

#3 Updated by chunsong feng 2 months ago

Yes, dpdk is enabled, the error probability is 10%, and retry will succeed

#4 Updated by Sage Weil 23 days ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF