Project

General

Profile

Actions

Bug #15117

closed

hammer: CentOS 7 tcmalloc::ThreadCache valgrind error

Added by Loïc Dachary about 8 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-amd64-notcmalloc/#origin/hammer-backports

http://pulpito.ceph.com/loic-2016-03-12_17:38:57-rgw-hammer-backports---basic-smithi/56962

<error>
  <unique>0x7</unique>
  <tid>26</tid>
  <kind>SyscallParam</kind>
  <what>Syscall param msync(start) points to unaddressable byte(s)</what>
  <stack>
    <frame>
      <ip>0x609A90D</ip>
      <obj>/usr/lib64/libpthread-2.17.so</obj>
    </frame>
    <frame>
      <ip>0x78B7F63</ip>
      <obj>/usr/lib64/libunwind.so.8.0.1</obj>
    </frame>
    <frame>
      <ip>0x78BAEAE</ip>
      <obj>/usr/lib64/libunwind.so.8.0.1</obj>
    </frame>
    <frame>
      <ip>0x78BC181</ip>
      <obj>/usr/lib64/libunwind.so.8.0.1</obj>
    </frame>
    <frame>
      <ip>0x78BC518</ip>
      <obj>/usr/lib64/libunwind.so.8.0.1</obj>
    </frame>
    <frame>
      <ip>0x78B8900</ip>
      <obj>/usr/lib64/libunwind.so.8.0.1</obj>
      <fn>_ULx86_64_step</fn>
    </frame>
    <frame>
      <ip>0x58E88CA</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
    </frame>
    <frame>
      <ip>0x58E90BD</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
      <fn>GetStackTrace(void**, int, int)</fn>
    </frame>
    <frame>
      <ip>0x58DA313</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
      <fn>tcmalloc::PageHeap::GrowHeap(unsigned long)</fn>
    </frame>
    <frame>
      <ip>0x58DA632</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
      <fn>tcmalloc::PageHeap::New(unsigned long)</fn>
    </frame>
    <frame>
      <ip>0x58D8F63</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
      <fn>tcmalloc::CentralFreeList::Populate()</fn>
    </frame>
    <frame>
      <ip>0x58D9147</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
      <fn>tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)</fn>
    </frame>
    <frame>
      <ip>0x58D91DC</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
      <fn>tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)</fn>
    </frame>
    <frame>
      <ip>0x58DC234</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
      <fn>tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)</fn>
    </frame>
    <frame>
      <ip>0x58ED771</ip>
      <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj>
      <fn>posix_memalign</fn>
    </frame>
    <frame>
      <ip>0xC48DCB</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::buffer::create_aligned(unsigned int, unsigned int)</fn>
    </frame>
    <frame>
      <ip>0xC4902A</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::buffer::list::append(char const*, unsigned int)</fn>
    </frame>
    <frame>
      <ip>0x775B74</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>eversion_t::encode(ceph::buffer::list&amp;) const</fn>
    </frame>
    <frame>
      <ip>0x771A61</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PGLog::_write_log(ObjectStore::Transaction&amp;, pg_log_t&amp;, coll_t const&amp;, ghobject_t const&amp;, std::map&lt;eversion_t, hobject_t, std::less&lt;eversion_t&gt;, std::allocator&lt;std::pair&lt;eversion_t const, hobject_t&gt; &gt; &gt;&amp;, eversion_t, eversion_t, eversion_t, std::set&lt;eversion_t, std::less&lt;eversion_t&gt;, std::allocator&lt;eversion_t&gt; &gt; const&amp;, bool, bool, std::set&lt;std::string, std::less&lt;std::string&gt;, std::allocator&lt;std::string&gt; &gt;*)</fn>
    </frame>
    <frame>
      <ip>0x77208E</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PGLog::write_log(ObjectStore::Transaction&amp;, coll_t const&amp;, ghobject_t const&amp;)</fn>
    </frame>
    <frame>
      <ip>0x7CA9CC</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PG::init(int, std::vector&lt;int, std::allocator&lt;int&gt; &gt; const&amp;, int, std::vector&lt;int, std::allocator&lt;int&gt; &gt; const&amp;, int, pg_history_t const&amp;, std::map&lt;unsigned int, pg_interval_t, std::less&lt;unsigned int&gt;, std::allocator&lt;std::pair&lt;unsigned int const, pg_interval_t&gt; &gt; &gt;&amp;, bool, ObjectStore::Transaction*)</fn>
    </frame>
    <frame>
      <ip>0x6A4097</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::_create_lock_pg(std::tr1::shared_ptr&lt;OSDMap const&gt;, spg_t, bool, bool, bool, int, std::vector&lt;int, std::allocator&lt;int&gt; &gt;&amp;, int, std::vector&lt;int, std::allocator&lt;int&gt; &gt;&amp;, int, pg_history_t, std::map&lt;unsigned int, pg_interval_t, std::less&lt;unsigned int&gt;, std::allocator&lt;std::pair&lt;unsigned int const, pg_interval_t&gt; &gt; &gt;&amp;, ObjectStore::Transaction&amp;)</fn>
    </frame>
    <frame>
      <ip>0x6AE711</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::handle_pg_peering_evt(spg_t, pg_info_t const&amp;, std::map&lt;unsigned int, pg_interval_t, std::less&lt;unsigned int&gt;, std::allocator&lt;std::pair&lt;unsigned int const, pg_interval_t&gt; &gt; &gt;&amp;, unsigned int, pg_shard_t, bool, std::tr1::shared_ptr&lt;PG::CephPeeringEvt&gt;)</fn>
    </frame>
    <frame>
      <ip>0x6AFF39</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::handle_pg_notify(std::tr1::shared_ptr&lt;OpRequest&gt;)</fn>
    </frame>
    <frame>
      <ip>0x6B2AEF</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::dispatch_op(std::tr1::shared_ptr&lt;OpRequest&gt;)</fn>
    </frame>
    <frame>
      <ip>0x6B86CD</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::_dispatch(Message*)</fn>
    </frame>
    <frame>
      <ip>0x6B8DB6</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::ms_dispatch(Message*)</fn>
    </frame>
    <frame>
      <ip>0xC8E0D9</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>DispatchQueue::entry()</fn>
    </frame>
    <frame>
      <ip>0xBB0BFC</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>DispatchQueue::DispatchThread::entry()</fn>
    </frame>
    <frame>
      <ip>0x6093DC4</ip>
      <obj>/usr/lib64/libpthread-2.17.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0x75EA21C</ip>
      <obj>/usr/lib64/libc-2.17.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>
  <auxwhat>Address 0x17f86000 is on thread 26's stack</auxwhat>
  <auxwhat>368 bytes below stack pointer</auxwhat>
</error>

Related issues 4 (0 open4 closed)

Related to Ceph - Backport #14799: hammer: CentOS 7 tcmalloc::ThreadCache valgrind error libboost_thread-mt.so.1.53ResolvedLoïc DacharyActions
Has duplicate Ceph - Bug #16638: "saw valgrind issue <kind>SyscallParam</kind>" in hammer integration testing (rgw)Duplicate07/08/2016

Actions
Has duplicate Ceph - Bug #16642: "saw valgrind issue <kind>SyscallParam</kind>" in hammer integration testing (fs)Duplicate07/09/2016

Actions
Is duplicate of Ceph - Bug #17035: "saw valgrind issues" in hammer 0.94.8 release ResolvedKefu Chai08/15/2016

Actions
Actions #1

Updated by Loïc Dachary about 8 years ago

http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-amd64-notcmalloc/log.cgi?log=8cc324df4caf0d043208d9abe88a0e217e861dc3

...
+ hostname
+ grep -q ^gitbuilder-
+ hostname
+ grep -q -- -notcmalloc
+ echo hostname has -notcmalloc, will build --without-tcmalloc --without-cryptopp
hostname has -notcmalloc, will build --without-tcmalloc --without-cryptopp
+ export CEPH_EXTRA_CONFIGURE_ARGS= --without-cryptopp --without-tcmalloc
+ hostname
+ grep -q -- -gcov 
...
 ./configure --prefix=/usr --localstatedir=/var \
--sysconfdir=/etc --with-ocf --with-rest-bench --with-nss --with-debug --enable-cephfs-java --with-librocksdb-static=check --build x86_64-linux-gnu \
--without-cryptopp --without-tcmalloc
configure: RPM_RELEASE='0' 
...
Actions #2

Updated by Loïc Dachary about 8 years ago

  • Status changed from New to Need More Info

I think this is a weird case of using the tcmalloc packages instead of the notcmalloc packages. The http://tracker.ceph.com/issues/15117#note-1 gitbuilder is not the one that was actually used. Let's re-visit this when we have a run that actually matches the current gitbuilder.

Actions #3

Updated by Nathan Cutler almost 8 years ago

  • Has duplicate Bug #16638: "saw valgrind issue <kind>SyscallParam</kind>" in hammer integration testing (rgw) added
Actions #4

Updated by Nathan Cutler almost 8 years ago

  • Has duplicate Bug #16642: "saw valgrind issue <kind>SyscallParam</kind>" in hammer integration testing (fs) added
Actions #5

Updated by Nathan Cutler almost 8 years ago

Very similar errors showing up now in hammer-backports, always in notcmalloc jobs. For example:

/a/smithfarm-2016-07-20_00:22:41-rados-hammer-backports---basic-smithi/324338/remote/smithi004/log/valgrind

The teuthology log seems to indicate that the notcmalloc gitbuilder was used:

2016-07-20T02:37:06.297 INFO:teuthology.task.install:Installing packages: ceph-radosgw, ceph-test, ceph-devel, ceph, ceph-fuse, cephfs-java, libcephfs_jni1, libcephfs1, librados2, librbd1, python-ceph, rbd-fuse on remote rpm x86_64
2016-07-20T02:37:06.298 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using sha1
2016-07-20T02:37:06.299 INFO:teuthology.orchestra.run.smithi020:Running: 'sudo yum -y install http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-notcmalloc/sha1/2ee8cd65a68e1b799d1bfef309cd07a63e3d55da/noarch/ceph-release-1-0.el7.noarch.rpm'
2016-07-20T02:37:06.308 DEBUG:teuthology.misc:System to be installed: CentOS
2016-07-20T02:37:06.310 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using sha1
2016-07-20T02:37:06.311 INFO:teuthology.task.install:Pulling from http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-notcmalloc/sha1/2ee8cd65a68e1b799d1bfef309cd07a63e3d55da
2016-07-20T02:37:06.312 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using sha1
2016-07-20T02:37:06.313 INFO:teuthology.packaging:Looking for package version: http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-notcmalloc/sha1/2ee8cd65a68e1b799d1bfef309cd07a63e3d55da/version
2016-07-20T02:37:06.335 INFO:teuthology.packaging:Package found...

So even though packages are installed from the notcmalloc gitbuilder, the valgrind stacktrace indicates that tcmalloc is used...

Actions #6

Updated by Nathan Cutler almost 8 years ago

  • Status changed from Need More Info to 12
  • Priority changed from Normal to Urgent
Actions #7

Updated by Nathan Cutler almost 8 years ago

  • Related to Backport #14799: hammer: CentOS 7 tcmalloc::ThreadCache valgrind error libboost_thread-mt.so.1.53 added
Actions #8

Updated by Nathan Cutler over 7 years ago

Kefu, now that #17035 is marked resolved, can this one and #14799 be closed as duplicates?

Actions #9

Updated by Kefu Chai over 7 years ago

@Nathan Weinberg, yes, let me do this.

Actions #10

Updated by Kefu Chai over 7 years ago

  • Is duplicate of Bug #17035: "saw valgrind issues" in hammer 0.94.8 release added
Actions #11

Updated by Kefu Chai over 7 years ago

  • Status changed from 12 to Duplicate
Actions

Also available in: Atom PDF