Project

General

Profile

Actions

Bug #18744

closed

notcmalloc builds link tcmalloc (jewel 10.2.6 integration testing, hammer 0.94.10 QE testing)

Added by Nathan Cutler about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Jewel 10.2.6 integration testing is suffering from high rates of "saw valgrind issues" failures in rgw runs.

Here is an typical run, with scads of "saw valgrind issues" failures: http://pulpito.ceph.com/smithfarm-2017-01-30_12:10:53-rgw-wip-jewel-backports-distro-basic-smithi/

Upon closer examination, Shaman is correctly querying for the notcmalloc build flavor, like so: https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=notcmalloc&distros=centos%2F7%2Fx86_64&sha1=b671230f7f70b620905eb02c6dbd93d051b53fb7

This Shaman query returns JSON like this: [{"status": "ready", "sha1": "b671230f7f70b620905eb02c6dbd93d051b53fb7", "extra": {"build_url": "https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/847/", "root_build_cause": "SCMTRIGGER", "version": "10.2.5-6034-gb671230", "node_name": "172.21.1.42+slave-centos05", "job_name": "ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge", "package_manager_version": "10.2.5-6034.gb671230"}, "url": "https://4.chacra.ceph.com/r/ceph/wip-jewel-backports/b671230f7f70b620905eb02c6dbd93d051b53fb7/centos/7/flavors/notcmalloc/", "distro_codename": null, "modified": "2017-01-30 00:03:31.999555", "distro_version": "7", "project": "ceph", "flavor": "notcmalloc", "ref": "wip-jewel-backports", "chacra_url": "https://4.chacra.ceph.com/repos/ceph/wip-jewel-backports/b671230f7f70b620905eb02c6dbd93d051b53fb7/centos/7/flavors/notcmalloc/", "archs": ["x86_64", "source"], "distro": "centos"}]

Examining the build log from https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/847/ it is obvious that the Ceph daemons are being linked with libtcmalloc:

Processing files: ceph-mon-10.2.5-6034.gb671230.el7.x86_64
Provides: ceph-mon = 1:10.2.5-6034.gb671230.el7 ceph-mon(x86-64) = 1:10.2.5-6034.gb671230.el7
Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: /usr/bin/env ld-linux-x86-64.so.2()(64bit) ld-linux-x86-64.so.2(GLIBC_2.3)(64bit) libboost_iostreams-mt.so.1.53.0()(64bit) libboost_random-mt.so.1.53.0()(64bit) libboost_system-mt.so.1.53.0()(64bit) libboost_thread-mt.so.1.53.0()(64bit) libc.so.6()(64bit) libc.so.6(GLIBC_2.10)(64bit) libc.so.6(GLIBC_2.14)(64bit) libc.so.6(GLIBC_2.16)(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) libc.so.6(GLIBC_2.3.3)(64bit) libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.5)(64bit) libc.so.6(GLIBC_2.6)(64bit) libc.so.6(GLIBC_2.7)(64bit) libc.so.6(GLIBC_2.8)(64bit) libc.so.6(GLIBC_2.9)(64bit) libdl.so.2()(64bit) libdl.so.2(GLIBC_2.2.5)(64bit) libgcc_s.so.1()(64bit) libgcc_s.so.1(GCC_3.0)(64bit) libleveldb.so.1()(64bit) libm.so.6()(64bit) libm.so.6(GLIBC_2.2.5)(64bit) libnspr4.so()(64bit) libnss3.so()(64bit) libnss3.so(NSS_3.12.5)(64bit) libnss3.so(NSS_3.12.9)(64bit) libnss3.so(NSS_3.2)(64bit) libnss3.so(NSS_3.3)(64bit) libpthread.so.0()(64bit) libpthread.so.0(GLIBC_2.12)(64bit) libpthread.so.0(GLIBC_2.2.5)(64bit) libpthread.so.0(GLIBC_2.3.2)(64bit) librt.so.1()(64bit) librt.so.1(GLIBC_2.2.5)(64bit) libsnappy.so.1()(64bit) libstdc++.so.6()(64bit) libstdc++.so.6(CXXABI_1.3)(64bit) libstdc++.so.6(CXXABI_1.3.1)(64bit) libstdc++.so.6(CXXABI_1.3.5)(64bit) libstdc++.so.6(CXXABI_1.3.7)(64bit) libstdc++.so.6(GLIBCXX_3.4)(64bit) libstdc++.so.6(GLIBCXX_3.4.11)(64bit) libstdc++.so.6(GLIBCXX_3.4.14)(64bit) libstdc++.so.6(GLIBCXX_3.4.15)(64bit) libstdc++.so.6(GLIBCXX_3.4.18)(64bit) libstdc++.so.6(GLIBCXX_3.4.19)(64bit) libstdc++.so.6(GLIBCXX_3.4.9)(64bit) libtcmalloc.so.4()(64bit) libz.so.1()(64bit) python(abi) = 2.7 rtld(GNU_HASH)

(same for the other daemons)

See #18084 for the last time this happened.

Actions #1

Updated by Nathan Cutler about 7 years ago

  • Description updated (diff)
Actions #2

Updated by Nathan Cutler about 7 years ago

  • Subject changed from notcmalloc builds link tcmalloc to notcmalloc builds link tcmalloc (jewel 10.2.6 integration testing)
Actions #3

Updated by Nathan Cutler about 7 years ago

  • Description updated (diff)
Actions #4

Updated by Nathan Cutler about 7 years ago

  • Subject changed from notcmalloc builds link tcmalloc (jewel 10.2.6 integration testing) to notcmalloc builds link tcmalloc (jewel 10.2.6 integration testing, hammer 0.94.10 QE testing)

This is also affecting the hammer 0.94.10 QE testing.

The tests in this run http://pulpito.front.sepia.ceph.com/smithfarm-2017-01-29_17:36:17-rgw:verify-hammer-distro-basic-smithi/

query Shaman for notcmalloc flavor of SHA1 83af8cdaaa6d94404e6146b68e532a784e3cc99c like so: https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=notcmalloc&distros=centos%2F7%2Fx86_64&ref=hammer

The 83af8cdaaa6d94404e6146b68e532a784e3cc99c notcmalloc build log is https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/4462/consoleFull

and there we see the ceph binary is linked with libtcmalloc:

Processing files: ceph-0.94.9-4530.g83af8cd.el7.x86_64
Provides: ceph = 1:0.94.9-4530.g83af8cd.el7 ceph(x86-64) = 1:0.94.9-4530.g83af8cd.el7 config(ceph) = 1:0.94.9-4530.g83af8cd.el7 libcls_hello.so()(64bit) libcls_kvs.so()(64bit) libcls_lock.so()(64bit) libcls_log.so()(64bit) libcls_rbd.so()(64bit) libcls_refcount.so()(64bit) libcls_replica_log.so()(64bit) libcls_rgw.so()(64bit) libcls_statelog.so()(64bit) libcls_user.so.1()(64bit) libcls_version.so()(64bit) libec_example.so.0()(64bit) libec_fail_to_initialize.so.0()(64bit) libec_fail_to_register.so.0()(64bit) libec_hangs.so.0()(64bit) libec_isa.so.2()(64bit) libec_jerasure.so.2()(64bit) libec_jerasure_generic.so.2()(64bit) libec_jerasure_sse3.so.2()(64bit) libec_jerasure_sse4.so.2()(64bit) libec_lrc.so.1()(64bit) libec_missing_entry_point.so.0()(64bit) libec_missing_version.so.0()(64bit) libec_shec.so.1()(64bit) libec_test_jerasure_generic.so.0()(64bit) libec_test_jerasure_neon.so.0()(64bit) libec_test_jerasure_sse3.so.0()(64bit) libec_test_jerasure_sse4.so.0()(64bit) libos_tp.so.1()(64bit) libosd_tp.so.1()(64bit)
Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires(post): /bin/sh binutils chkconfig
Requires(preun): /bin/sh chkconfig initscripts
Requires(postun): /bin/sh
Requires: /bin/sh /usr/bin/env ld-linux-x86-64.so.2()(64bit) ld-linux-x86-64.so.2(GLIBC_2.3)(64bit) libaio.so.1()(64bit) libaio.so.1(LIBAIO_0.1)(64bit) libaio.so.1(LIBAIO_0.4)(64bit) libboost_program_options-mt.so.1.53.0()(64bit) libboost_system-mt.so.1.53.0()(64bit) libboost_thread-mt.so.1.53.0()(64bit) libbz2.so.1()(64bit) libc.so.6()(64bit) libc.so.6(GLIBC_2.10)(64bit) libc.so.6(GLIBC_2.14)(64bit) libc.so.6(GLIBC_2.16)(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) libc.so.6(GLIBC_2.3.3)(64bit) libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.5)(64bit) libc.so.6(GLIBC_2.6)(64bit) libc.so.6(GLIBC_2.9)(64bit) libcephfs.so.1()(64bit) libcls_user.so.1()(64bit) libdl.so.2()(64bit) libdl.so.2(GLIBC_2.2.5)(64bit) libec_example.so.0()(64bit) libec_fail_to_initialize.so.0()(64bit) libec_fail_to_register.so.0()(64bit) libec_hangs.so.0()(64bit) libec_isa.so.2()(64bit) libec_jerasure.so.2()(64bit) libec_jerasure_generic.so.2()(64bit) libec_jerasure_sse3.so.2()(64bit) libec_jerasure_sse4.so.2()(64bit) libec_lrc.so.1()(64bit) libec_missing_entry_point.so.0()(64bit) libec_missing_version.so.0()(64bit) libec_shec.so.1()(64bit) libec_test_jerasure_generic.so.0()(64bit) libec_test_jerasure_neon.so.0()(64bit) libec_test_jerasure_sse3.so.0()(64bit) libec_test_jerasure_sse4.so.0()(64bit) libgcc_s.so.1()(64bit) libgcc_s.so.1(GCC_3.0)(64bit) libgcc_s.so.1(GCC_4.0.0)(64bit) libkeyutils.so.1()(64bit) libkeyutils.so.1(KEYUTILS_0.3)(64bit) libleveldb.so.1()(64bit) liblttng-ust.so.0()(64bit) libm.so.6()(64bit) libm.so.6(GLIBC_2.2.5)(64bit) libnspr4.so()(64bit) libnss3.so()(64bit) libnss3.so(NSS_3.12.5)(64bit) libnss3.so(NSS_3.12.9)(64bit) libnss3.so(NSS_3.2)(64bit) libnss3.so(NSS_3.3)(64bit) libos_tp.so.1()(64bit) libosd_tp.so.1()(64bit) libpthread.so.0()(64bit) libpthread.so.0(GLIBC_2.12)(64bit) libpthread.so.0(GLIBC_2.2.5)(64bit) libpthread.so.0(GLIBC_2.3.2)(64bit) librados.so.2()(64bit) librt.so.1()(64bit) librt.so.1(GLIBC_2.2.5)(64bit) libsnappy.so.1()(64bit) libstdc++.so.6()(64bit) libstdc++.so.6(CXXABI_1.3)(64bit) libstdc++.so.6(CXXABI_1.3.1)(64bit) libstdc++.so.6(CXXABI_1.3.5)(64bit) libstdc++.so.6(GLIBCXX_3.4)(64bit) libstdc++.so.6(GLIBCXX_3.4.10)(64bit) libstdc++.so.6(GLIBCXX_3.4.11)(64bit) libstdc++.so.6(GLIBCXX_3.4.14)(64bit) libstdc++.so.6(GLIBCXX_3.4.15)(64bit) libstdc++.so.6(GLIBCXX_3.4.18)(64bit) libstdc++.so.6(GLIBCXX_3.4.5)(64bit) libstdc++.so.6(GLIBCXX_3.4.9)(64bit) libtcmalloc.so.4()(64bit) libuuid.so.1()(64bit) libuuid.so.1(UUID_1.0)(64bit) libz.so.1()(64bit) python(abi) = 2.7 rtld(GNU_HASH)
Actions #5

Updated by David Galloway about 7 years ago

Doesn't appear to be limited to CentOS either :(

https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=trusty,DIST=trusty,MACHINE_SIZE=huge/847/consoleFull

checking for malloc in -ltcmalloc... yes

Although it's trying...

+ sudo CEPH_EXTRA_CMAKE_ARGS=-DALLOCATOR=libc pbuilder build --distribution trusty --basetgz /srv/debian-base/trusty.tgz --buildresult ./release/10.2.5-6034-gb671230 --debbuildopts -j8 ./release/10.2.5-6034-gb671230/ceph_10.2.5-6034-gb671230-1trusty.dsc

But from that build:

dgalloway@w541 asdf ()$ wget https://1.chacra.ceph.com/r/ceph/wip-jewel-backports/b671230f7f70b620905eb02c6dbd93d051b53fb7/ubuntu/trusty/flavors/notcmalloc/pool/main/c/ceph/ceph-mon_10.2.5-6034-gb671230-1trusty_amd64.deb
--2017-01-31 10:45:25--  https://1.chacra.ceph.com/r/ceph/wip-jewel-backports/b671230f7f70b620905eb02c6dbd93d051b53fb7/ubuntu/trusty/flavors/notcmalloc/pool/main/c/ceph/ceph-mon_10.2.5-6034-gb671230-1trusty_amd64.deb
Resolving 1.chacra.ceph.com (1.chacra.ceph.com)... 158.69.67.31
Connecting to 1.chacra.ceph.com (1.chacra.ceph.com)|158.69.67.31|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3720642 (3.5M) [application/octet-stream]
Saving to: ‘ceph-mon_10.2.5-6034-gb671230-1trusty_amd64.deb’

ceph-mon_10.2.5-6034-gb671230-1trusty_amd64.deb             100%[========================================================================================================================================>]   3.55M  3.95MB/s    in 0.9s    

2017-01-31 10:45:26 (3.95 MB/s) - ‘ceph-mon_10.2.5-6034-gb671230-1trusty_amd64.deb’ saved [3720642/3720642]

dgalloway@w541 asdf ()$ dpkg -x ceph-mon_10.2.5-6034-gb671230-1trusty_amd64.deb .
dgalloway@w541 asdf ()$ ldd usr/bin/ceph-
ceph-mon       ceph-rest-api  
dgalloway@w541 asdf ()$ ldd usr/bin/ceph-mon | grep tcmalloc
    libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007f65bde76000)
Actions #6

Updated by Dan Mick about 7 years ago

It seems like Jenkins is doing what it had been doing. Did someone break cmake?

Actions #7

Updated by Dan Mick about 7 years ago

Wait a minute....jewel? Jewel doesn't even use cmake. I don't remember how this was supposed to be selected for jewel, but the build has changed drastically since then.

edit: yeah, it never worked for jewel on Jenkins. The only attempt affected only the first call to configure, which is only for building the tarball. No one ever tried to fix it for pre-cmake builds.

Actions #8

Updated by Nathan Cutler about 7 years ago

For both hammer and jewel, it should be sufficient to pass --without-tcmalloc to rpmbuild.

I don't know, but would think that there must be an equivalent knob for the debian build as well.

Actions #9

Updated by Dan Mick about 7 years ago

Yes. I think there's a relatively-simple fix for CentOS, but I don't know what's wrong with Ubuntu. It ought to be working with debian/rules consuming CEPH_EXTRA_CONFIGURE_ARGS.

Actions #10

Updated by Dan Mick about 7 years ago

Duh. It expects "CEPH_EXTRA_CONFIGURE_ARGS" but the job is passing "CEPH_EXTRA_CMAKE_ARGS". Similar, but different.

Actions #11

Updated by Dan Mick about 7 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Dan Mick

I have untested code here: https://github.com/ceph/ceph-build/pull/628
I need advice on the best way to test without disrupting the world

Actions #12

Updated by Dan Mick about 7 years ago

https://jenkins.ceph.com/job/ceph-dev/5204/ looks good. Checking binaries to make sure there's no libtcmalloc dependency

Actions #13

Updated by Dan Mick about 7 years ago

rpm build is fixed. deb build is not. Currently stumped as to why.

Actions #14

Updated by Dan Mick about 7 years ago

  • Project changed from Infrastructure to CI
  • Status changed from Fix Under Review to Resolved

Ran another build with more debug and the output was good. Not sure if pilot error in finding the output from 5204 or glitch in matrix. Assuming it's good now.

Actions

Also available in: Atom PDF