Bug #15117
closedhammer: CentOS 7 tcmalloc::ThreadCache valgrind error
0%
Description
http://pulpito.ceph.com/loic-2016-03-12_17:38:57-rgw-hammer-backports---basic-smithi/56962
<error> <unique>0x7</unique> <tid>26</tid> <kind>SyscallParam</kind> <what>Syscall param msync(start) points to unaddressable byte(s)</what> <stack> <frame> <ip>0x609A90D</ip> <obj>/usr/lib64/libpthread-2.17.so</obj> </frame> <frame> <ip>0x78B7F63</ip> <obj>/usr/lib64/libunwind.so.8.0.1</obj> </frame> <frame> <ip>0x78BAEAE</ip> <obj>/usr/lib64/libunwind.so.8.0.1</obj> </frame> <frame> <ip>0x78BC181</ip> <obj>/usr/lib64/libunwind.so.8.0.1</obj> </frame> <frame> <ip>0x78BC518</ip> <obj>/usr/lib64/libunwind.so.8.0.1</obj> </frame> <frame> <ip>0x78B8900</ip> <obj>/usr/lib64/libunwind.so.8.0.1</obj> <fn>_ULx86_64_step</fn> </frame> <frame> <ip>0x58E88CA</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> </frame> <frame> <ip>0x58E90BD</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> <fn>GetStackTrace(void**, int, int)</fn> </frame> <frame> <ip>0x58DA313</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> <fn>tcmalloc::PageHeap::GrowHeap(unsigned long)</fn> </frame> <frame> <ip>0x58DA632</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> <fn>tcmalloc::PageHeap::New(unsigned long)</fn> </frame> <frame> <ip>0x58D8F63</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> <fn>tcmalloc::CentralFreeList::Populate()</fn> </frame> <frame> <ip>0x58D9147</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> <fn>tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)</fn> </frame> <frame> <ip>0x58D91DC</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> <fn>tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)</fn> </frame> <frame> <ip>0x58DC234</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> <fn>tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)</fn> </frame> <frame> <ip>0x58ED771</ip> <obj>/usr/lib64/libtcmalloc.so.4.2.6</obj> <fn>posix_memalign</fn> </frame> <frame> <ip>0xC48DCB</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::buffer::create_aligned(unsigned int, unsigned int)</fn> </frame> <frame> <ip>0xC4902A</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ceph::buffer::list::append(char const*, unsigned int)</fn> </frame> <frame> <ip>0x775B74</ip> <obj>/usr/bin/ceph-osd</obj> <fn>eversion_t::encode(ceph::buffer::list&) const</fn> </frame> <frame> <ip>0x771A61</ip> <obj>/usr/bin/ceph-osd</obj> <fn>PGLog::_write_log(ObjectStore::Transaction&, pg_log_t&, coll_t const&, ghobject_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, eversion_t, eversion_t, eversion_t, std::set<eversion_t, std::less<eversion_t>, std::allocator<eversion_t> > const&, bool, bool, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*)</fn> </frame> <frame> <ip>0x77208E</ip> <obj>/usr/bin/ceph-osd</obj> <fn>PGLog::write_log(ObjectStore::Transaction&, coll_t const&, ghobject_t const&)</fn> </frame> <frame> <ip>0x7CA9CC</ip> <obj>/usr/bin/ceph-osd</obj> <fn>PG::init(int, std::vector<int, std::allocator<int> > const&, int, std::vector<int, std::allocator<int> > const&, int, pg_history_t const&, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > >&, bool, ObjectStore::Transaction*)</fn> </frame> <frame> <ip>0x6A4097</ip> <obj>/usr/bin/ceph-osd</obj> <fn>OSD::_create_lock_pg(std::tr1::shared_ptr<OSDMap const>, spg_t, bool, bool, bool, int, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, pg_history_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > >&, ObjectStore::Transaction&)</fn> </frame> <frame> <ip>0x6AE711</ip> <obj>/usr/bin/ceph-osd</obj> <fn>OSD::handle_pg_peering_evt(spg_t, pg_info_t const&, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > >&, unsigned int, pg_shard_t, bool, std::tr1::shared_ptr<PG::CephPeeringEvt>)</fn> </frame> <frame> <ip>0x6AFF39</ip> <obj>/usr/bin/ceph-osd</obj> <fn>OSD::handle_pg_notify(std::tr1::shared_ptr<OpRequest>)</fn> </frame> <frame> <ip>0x6B2AEF</ip> <obj>/usr/bin/ceph-osd</obj> <fn>OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)</fn> </frame> <frame> <ip>0x6B86CD</ip> <obj>/usr/bin/ceph-osd</obj> <fn>OSD::_dispatch(Message*)</fn> </frame> <frame> <ip>0x6B8DB6</ip> <obj>/usr/bin/ceph-osd</obj> <fn>OSD::ms_dispatch(Message*)</fn> </frame> <frame> <ip>0xC8E0D9</ip> <obj>/usr/bin/ceph-osd</obj> <fn>DispatchQueue::entry()</fn> </frame> <frame> <ip>0xBB0BFC</ip> <obj>/usr/bin/ceph-osd</obj> <fn>DispatchQueue::DispatchThread::entry()</fn> </frame> <frame> <ip>0x6093DC4</ip> <obj>/usr/lib64/libpthread-2.17.so</obj> <fn>start_thread</fn> </frame> <frame> <ip>0x75EA21C</ip> <obj>/usr/lib64/libc-2.17.so</obj> <fn>clone</fn> </frame> </stack> <auxwhat>Address 0x17f86000 is on thread 26's stack</auxwhat> <auxwhat>368 bytes below stack pointer</auxwhat> </error>
Updated by Loïc Dachary about 8 years ago
... + hostname + grep -q ^gitbuilder- + hostname + grep -q -- -notcmalloc + echo hostname has -notcmalloc, will build --without-tcmalloc --without-cryptopp hostname has -notcmalloc, will build --without-tcmalloc --without-cryptopp + export CEPH_EXTRA_CONFIGURE_ARGS= --without-cryptopp --without-tcmalloc + hostname + grep -q -- -gcov ... ./configure --prefix=/usr --localstatedir=/var \ --sysconfdir=/etc --with-ocf --with-rest-bench --with-nss --with-debug --enable-cephfs-java --with-librocksdb-static=check --build x86_64-linux-gnu \ --without-cryptopp --without-tcmalloc configure: RPM_RELEASE='0' ...
Updated by Loïc Dachary about 8 years ago
- Status changed from New to Need More Info
I think this is a weird case of using the tcmalloc packages instead of the notcmalloc packages. The http://tracker.ceph.com/issues/15117#note-1 gitbuilder is not the one that was actually used. Let's re-visit this when we have a run that actually matches the current gitbuilder.
Updated by Nathan Cutler almost 8 years ago
- Has duplicate Bug #16638: "saw valgrind issue <kind>SyscallParam</kind>" in hammer integration testing (rgw) added
Updated by Nathan Cutler almost 8 years ago
- Has duplicate Bug #16642: "saw valgrind issue <kind>SyscallParam</kind>" in hammer integration testing (fs) added
Updated by Nathan Cutler almost 8 years ago
Very similar errors showing up now in hammer-backports, always in notcmalloc jobs. For example:
/a/smithfarm-2016-07-20_00:22:41-rados-hammer-backports---basic-smithi/324338/remote/smithi004/log/valgrind
The teuthology log seems to indicate that the notcmalloc gitbuilder was used:
2016-07-20T02:37:06.297 INFO:teuthology.task.install:Installing packages: ceph-radosgw, ceph-test, ceph-devel, ceph, ceph-fuse, cephfs-java, libcephfs_jni1, libcephfs1, librados2, librbd1, python-ceph, rbd-fuse on remote rpm x86_64 2016-07-20T02:37:06.298 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using sha1 2016-07-20T02:37:06.299 INFO:teuthology.orchestra.run.smithi020:Running: 'sudo yum -y install http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-notcmalloc/sha1/2ee8cd65a68e1b799d1bfef309cd07a63e3d55da/noarch/ceph-release-1-0.el7.noarch.rpm' 2016-07-20T02:37:06.308 DEBUG:teuthology.misc:System to be installed: CentOS 2016-07-20T02:37:06.310 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using sha1 2016-07-20T02:37:06.311 INFO:teuthology.task.install:Pulling from http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-notcmalloc/sha1/2ee8cd65a68e1b799d1bfef309cd07a63e3d55da 2016-07-20T02:37:06.312 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using sha1 2016-07-20T02:37:06.313 INFO:teuthology.packaging:Looking for package version: http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-notcmalloc/sha1/2ee8cd65a68e1b799d1bfef309cd07a63e3d55da/version 2016-07-20T02:37:06.335 INFO:teuthology.packaging:Package found...
So even though packages are installed from the notcmalloc gitbuilder, the valgrind stacktrace indicates that tcmalloc is used...
Updated by Nathan Cutler almost 8 years ago
- Status changed from Need More Info to 12
- Priority changed from Normal to Urgent
Updated by Nathan Cutler almost 8 years ago
- Related to Backport #14799: hammer: CentOS 7 tcmalloc::ThreadCache valgrind error libboost_thread-mt.so.1.53 added
Updated by Nathan Cutler over 7 years ago
Updated by Kefu Chai over 7 years ago
- Is duplicate of Bug #17035: "saw valgrind issues" in hammer 0.94.8 release added