Project

General

Profile

Actions

Bug #35969

closed

"symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4

Added by Kefu Chai over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

see /a/kchai-2018-09-13_01:57:49-ceph-disk-wip-fix-35906-distro-basic-ovh/3012294

2018-09-13T02:28:57.997 INFO:teuthology.orchestra.run.ovh030:Running: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --no-mon-config --cluster ceph --mkfs --mkkey -i 0 --monmap /home/ubuntu/cephtest/ceph.monmap'
...
2018-09-13T02:28:59.551 INFO:teuthology.orchestra.run.ovh030.stderr:ceph-osd: symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm

and per https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/14456//consoleFull

 --> Already installed : gperftools-devel-2.6.1-1.el7.x86_64

per /a/kchai-2018-09-13_01:57:49-ceph-disk-wip-fix-35906-distro-basic-ovh/3012294/teuthology.log

  description: ceph-disk/basic/{distros/centos_latest.yaml tasks/ceph-disk.yaml}
...
  os_type: centos
  os_version: '7.4'

so, when mimic was released. the "latest" centos was 7.4. by then, the shipped gperftools-libs was gperftools-libs-2.4-7 .

the tested Ceph is always compiled with the latest centos (7.5 at the time of writing), where gperftools-lib's version is 2.6.1. while the mimic's rados test suite is still pointing to centos 7.4.

$ c++filt _ZdlPvm
operator delete(void*, unsigned long)

this operator was introduced in gperftool 2.6.1, see https://github.com/gperftools/gperftools/commit/7efb3ecf37d88edf9cf9a43efb89b425eaf81d5e , search for "ENABLE_SIZED_DELETE".

that's why we have the missing symbol on centos 7.4.


Related issues 5 (0 open5 closed)

Related to bluestore - Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCache during mkfs->_open_dbResolvedKefu Chai04/11/2018

Actions
Related to RADOS - Bug #36508: gperftools-libs-2.6.1-1 or newer required for binaries linked against corresponding version at build timeResolvedBrad Hubbard10/18/2018

Actions
Has duplicate Ceph - Bug #36112: "ceph-osd: undefined symbol: _ZdlPvm" in smokeDuplicate09/22/2018

Actions
Copied to RADOS - Backport #36131: luminous: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4ResolvedKefu ChaiActions
Copied to RADOS - Backport #36132: mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4ResolvedKefu ChaiActions
Actions #1

Updated by Kefu Chai over 5 years ago

  • Description updated (diff)
Actions #2

Updated by Kefu Chai over 5 years ago

  • Related to Bug #23653: tcmalloc Attempt to free invalid pointer 0x55de11f2a540 in rocksdb::LRUCache::~LRUCache during mkfs->_open_db added
Actions #3

Updated by Kefu Chai over 5 years ago

this issue resembles #23653. both of them are related to new memory management APIs. #23653 was related to aligned_alloc() introduced by C++17, while this issue is related to void operator delete ( void* ptr, std::size_t sz ); introduced by C++14, and probably more C++17 new/delete operators.

apparently, the gperftools included by centos 7.4 is way too out-dated. or put in other words, the gperftools in centos 7.5 is moving very fast to catch up with these standards. =)

last time, we fixed this issue by switching from aligned_alloc() back to posix_memalign(), the former is implemented by gperftools 2.6.x, the latter is always available in glibc.

but this time, the delete operator could be used everywhere. we can either define tcmalloc_sized_delete_enabled() in ceph which returns false at run-time if the gperftools' version is lower than 2.6.1, or export TCMALLOC_ENABLE_SIZED_DELETE environment variable in ceph's init script and tests to disable these new delete operators on centos 7.4. see https://github.com/gperftools/gperftools/blob/49dbe4362b431629111b85929d91fe9a46c42295/NEWS#L317

i think the first option is the way to go. so we need to check the existence new delete operators by comparing the version returned by tc_version() with "2.6.1". please note, the delete operator resolves using the ifunc attribute in GCC, so the symbol resolution is performed when the tcmalloc library loads. hence we don't need to cache the check result, and can just implement it in a straightforward way.

Actions #4

Updated by Kefu Chai over 5 years ago

  • Assignee set to Kefu Chai

asked on ceph-{maintainers,users,developers} to see if we can drop the support of centos 7.4, turns out it's a no-go. will define tcmalloc_sized_delete_enabled() in libceph-common when tcmalloc is enabled, to disable sized delete if tcmalloc's version is lower than 2.6.1.

we could use `dlsym()` the check if the sized delete exists, but that's kind of overkill IMO.

Actions #5

Updated by Kefu Chai over 5 years ago

  • Status changed from New to In Progress

https://github.com/ceph/ceph/pull/24124

as suggested by Brad, we can just bump the BuildRequires of gperftools.

Actions #6

Updated by Kefu Chai over 5 years ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Kefu Chai over 5 years ago

  • Backport set to luminous,mimic
Actions #8

Updated by Kefu Chai over 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #9

Updated by Brad Hubbard over 5 years ago

  • Has duplicate Bug #36112: "ceph-osd: undefined symbol: _ZdlPvm" in smoke added
Actions #10

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36131: luminous: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4 added
Actions #11

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #36132: mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4 added
Actions #12

Updated by Brad Hubbard over 5 years ago

  • Status changed from Pending Backport to 12

Not resolved as per https://github.com/ceph/ceph/pull/24260#issuecomment-427144712. Looking into this further.

Actions #13

Updated by Nathan Cutler over 5 years ago

@Brad: The backporting process for the original fix is already well-along. If a follow-up fix is required, could you open a new tracker for it? (Managing multiple master fixes in a single tracker tends to create backporting hell.)

Actions #14

Updated by Nathan Cutler over 5 years ago

  • Status changed from 12 to Pending Backport
Actions #15

Updated by Brad Hubbard over 5 years ago

@Nathan Weinberg, Understood, will open a new issue.

Actions #16

Updated by Brad Hubbard over 5 years ago

  • Related to Bug #36508: gperftools-libs-2.6.1-1 or newer required for binaries linked against corresponding version at build time added
Actions #18

Updated by Brad Hubbard over 5 years ago

  • Related to Bug #36508: gperftools-libs-2.6.1-1 or newer required for binaries linked against corresponding version at build time added
Actions #19

Updated by Brad Hubbard over 5 years ago

  • Related to deleted (Bug #36508: gperftools-libs-2.6.1-1 or newer required for binaries linked against corresponding version at build time)
Actions #20

Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF