Project

General

Profile

Bug #35969

"symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4

Added by Kefu Chai 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
09/13/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

see /a/kchai-2018-09-13_01:57:49-ceph-disk-wip-fix-35906-distro-basic-ovh/3012294

2018-09-13T02:28:57.997 INFO:teuthology.orchestra.run.ovh030:Running: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --no-mon-config --cluster ceph --mkfs --mkkey -i 0 --monmap /home/ubuntu/cephtest/ceph.monmap'
...
2018-09-13T02:28:59.551 INFO:teuthology.orchestra.run.ovh030.stderr:ceph-osd: symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm

and per https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/14456//consoleFull

 --> Already installed : gperftools-devel-2.6.1-1.el7.x86_64

per /a/kchai-2018-09-13_01:57:49-ceph-disk-wip-fix-35906-distro-basic-ovh/3012294/teuthology.log

  description: ceph-disk/basic/{distros/centos_latest.yaml tasks/ceph-disk.yaml}
...
  os_type: centos
  os_version: '7.4'

so, when mimic was released. the "latest" centos was 7.4. by then, the shipped gperftools-libs was gperftools-libs-2.4-7 .

the tested Ceph is always compiled with the latest centos (7.5 at the time of writing), where gperftools-lib's version is 2.6.1. while the mimic's rados test suite is still pointing to centos 7.4.

$ c++filt _ZdlPvm
operator delete(void*, unsigned long)

this operator was introduced in gperftool 2.6.1, see https://github.com/gperftools/gperftools/commit/7efb3ecf37d88edf9cf9a43efb89b425eaf81d5e , search for "ENABLE_SIZED_DELETE".

that's why we have the missing symbol on centos 7.4.


Related issues

Related to bluestore - Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCache during mkfs->_open_db Resolved 04/11/2018
Related to RADOS - Bug #36508: gperftools-libs-2.6.1-1 or newer required for binaries linked against corresponding version at build time Resolved 10/18/2018
Duplicated by Ceph - Bug #36112: "ceph-osd: undefined symbol: _ZdlPvm" in smoke Duplicate 09/22/2018
Copied to RADOS - Backport #36131: luminous: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4 Resolved
Copied to RADOS - Backport #36132: mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4 Resolved

History

#1 Updated by Kefu Chai 3 months ago

  • Description updated (diff)

#2 Updated by Kefu Chai 3 months ago

  • Related to Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCache during mkfs->_open_db added

#3 Updated by Kefu Chai 3 months ago

this issue resembles #23653. both of them are related to new memory management APIs. #23653 was related to aligned_alloc() introduced by C++17, while this issue is related to void operator delete ( void* ptr, std::size_t sz ); introduced by C++14, and probably more C++17 new/delete operators.

apparently, the gperftools included by centos 7.4 is way too out-dated. or put in other words, the gperftools in centos 7.5 is moving very fast to catch up with these standards. =)

last time, we fixed this issue by switching from aligned_alloc() back to posix_memalign(), the former is implemented by gperftools 2.6.x, the latter is always available in glibc.

but this time, the delete operator could be used everywhere. we can either define tcmalloc_sized_delete_enabled() in ceph which returns false at run-time if the gperftools' version is lower than 2.6.1, or export TCMALLOC_ENABLE_SIZED_DELETE environment variable in ceph's init script and tests to disable these new delete operators on centos 7.4. see https://github.com/gperftools/gperftools/blob/49dbe4362b431629111b85929d91fe9a46c42295/NEWS#L317

i think the first option is the way to go. so we need to check the existence new delete operators by comparing the version returned by tc_version() with "2.6.1". please note, the delete operator resolves using the ifunc attribute in GCC, so the symbol resolution is performed when the tcmalloc library loads. hence we don't need to cache the check result, and can just implement it in a straightforward way.

#4 Updated by Kefu Chai 3 months ago

  • Assignee set to Kefu Chai

asked on ceph-{maintainers,users,developers} to see if we can drop the support of centos 7.4, turns out it's a no-go. will define tcmalloc_sized_delete_enabled() in libceph-common when tcmalloc is enabled, to disable sized delete if tcmalloc's version is lower than 2.6.1.

we could use `dlsym()` the check if the sized delete exists, but that's kind of overkill IMO.

#5 Updated by Kefu Chai 3 months ago

  • Status changed from New to In Progress

https://github.com/ceph/ceph/pull/24124

as suggested by Brad, we can just bump the BuildRequires of gperftools.

#6 Updated by Kefu Chai 3 months ago

  • Status changed from In Progress to Need Review

#7 Updated by Kefu Chai 3 months ago

  • Backport set to luminous,mimic

#8 Updated by Kefu Chai 3 months ago

  • Status changed from Need Review to Pending Backport

#9 Updated by Brad Hubbard 3 months ago

  • Duplicated by Bug #36112: "ceph-osd: undefined symbol: _ZdlPvm" in smoke added

#10 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #36131: luminous: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4 added

#11 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #36132: mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4 added

#12 Updated by Brad Hubbard 2 months ago

  • Status changed from Pending Backport to Verified

Not resolved as per https://github.com/ceph/ceph/pull/24260#issuecomment-427144712. Looking into this further.

#13 Updated by Nathan Cutler 2 months ago

@Brad: The backporting process for the original fix is already well-along. If a follow-up fix is required, could you open a new tracker for it? (Managing multiple master fixes in a single tracker tends to create backporting hell.)

#14 Updated by Nathan Cutler 2 months ago

  • Status changed from Verified to Pending Backport

#15 Updated by Brad Hubbard 2 months ago

@Nathan, Understood, will open a new issue.

#16 Updated by Brad Hubbard about 2 months ago

  • Related to Bug #36508: gperftools-libs-2.6.1-1 or newer required for binaries linked against corresponding version at build time added

#18 Updated by Brad Hubbard about 2 months ago

  • Related to Bug #36508: gperftools-libs-2.6.1-1 or newer required for binaries linked against corresponding version at build time added

#19 Updated by Brad Hubbard about 2 months ago

  • Related to deleted (Bug #36508: gperftools-libs-2.6.1-1 or newer required for binaries linked against corresponding version at build time)

#20 Updated by Nathan Cutler about 1 month ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF