Bug #16686
closedcephfs.test_volume_client.TestVolumeClient failure (with "src/tcmalloc.cc:283] Attempt to free invalid pointer")
0%
Description
Ran into the following test failure, when running VolumeClient
tests using vstart_runner locally on a vstart Ceph cluster
using Ceph master branch (HEAD commit c5e5ef7179886c39ab826405fb5454e6640efa83),
2016-07-14 18:40:24,328.328 INFO:__main__:run args=['./bin/ceph', 'osd', 'dump', '--format=json-pretty']
2016-07-14 18:40:24,328.328 INFO:__main__:Running ['./bin/ceph', 'osd', 'dump', '--format=json-pretty']
2016-07-14 18:40:24,530.530 INFO:__main__:run args=['./bin/ceph', 'daemon', 'mon.a', 'config', 'get', 'mon_pg_warn_max_per_osd']
2016-07-14 18:40:24,530.530 INFO:__main__:Running ['./bin/ceph', 'daemon', 'mon.a', 'config', 'get', 'mon_pg_warn_max_per_osd']
2016-07-14 18:40:24,660.660 INFO:tasks.cephfs.filesystem:_json_asok output: {
"mon_pg_warn_max_per_osd": "300"
}
2016-07-14 18:40:24,660.660 INFO:tasks.cephfs.test_volume_client:max_per_osd 300
2016-07-14 18:40:24,661.661 INFO:tasks.cephfs.test_volume_client:osd_count 3
2016-07-14 18:40:24,661.661 INFO:tasks.cephfs.test_volume_client:max_overall 900
2016-07-14 18:40:24,661.661 INFO:tasks.cephfs.test_volume_client:existing_pg_count 26
2016-07-14 18:40:24,661.661 INFO:tasks.cephfs.test_volume_client:expected_pg_num 87
2016-07-14 18:40:24,661.661 INFO:__main__:run args=['./bin/ceph', 'osd', 'dump', '--format=json-pretty']
2016-07-14 18:40:24,661.661 INFO:__main__:Running ['./bin/ceph', 'osd', 'dump', '--format=json-pretty']
2016-07-14 18:40:24,845.845 INFO:__main__:run args=['python', '-c', '\nfrom ceph_volume_client import CephFSVolumeClient, VolumePath\nimport logging\nlog = logging.getLogger("ceph_volume_client")\nlog.addHandler(logging.StreamHandler())\nlog.setLevel(logging.DEBUG)\nvc = CephFSVolumeClient("manila", "./ceph.conf", "ceph", None, None)\nvc.connect()\n\nvp = VolumePath("grpid", "volid")\nvc.create_volume(vp, 10, data_isolated=True)\n\nvc.disconnect()\n ']
2016-07-14 18:40:24,846.846 INFO:__main__:Running ['python', '-c', '\nfrom ceph_volume_client import CephFSVolumeClient, VolumePath\nimport logging\nlog = logging.getLogger("ceph_volume_client")\nlog.addHandler(logging.StreamHandler())\nlog.setLevel(logging.DEBUG)\nvc = CephFSVolumeClient("manila", "./ceph.conf", "ceph", None, None)\nvc.connect()\n\nvp = VolumePath("grpid", "volid")\nvc.create_volume(vp, 10, data_isolated=True)\n\nvc.disconnect()\n ']
src/tcmalloc.cc:283] Attempt to free invalid pointer 0x557694b27cc0
2016-07-14 18:40:25,142.142 INFO:__main__:test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) ... ERROR
2016-07-14 18:40:25,143.143 ERROR:__main__:Traceback (most recent call last):
File "/home/rraja/git/ceph-qa-suite/tasks/cephfs/test_volume_client.py", line 370, in test_data_isolated
guest_entity=guest_entity
File "/home/rraja/git/ceph-qa-suite/tasks/cephfs/test_volume_client.py", line 34, in _volume_client_python
""".format(payload=script, conf_path=client.config_path, vol_prefix=vol_prefix, ns_prefix=ns_prefix))
File "/home/rraja/git/ceph-qa-suite/tasks/cephfs/mount.py", line 134, in run_python
p.wait()
File "/home/rraja/git/ceph-qa-suite/tasks/cephfs/vstart_runner.py", line 113, in wait
raise CommandFailedError(self.args, self.exitstatus)
CommandFailedError: Command failed with status -6: ['python', '-c', '\nfrom ceph_volume_client import CephFSVolumeClient, VolumePath\nimport logging\nlog = logging.getLogger("ceph_volume_client")\nlog.addHandler(logging.StreamHandler())\nlog.setLevel(logging.DEBUG)\nvc = CephFSVolumeClient("manila", "./ceph.conf", "ceph", None, None)\nvc.connect()\n\nvp = VolumePath("grpid", "volid")\nvc.create_volume(vp, 10, data_isolated=True)\n\nvc.disconnect()\n ']
The culprit seems to be,
"src/tcmalloc.cc:283] Attempt to free invalid pointer 0x557694b27cc0"
Files
Updated by Ramana Raja almost 8 years ago
- Subject changed from cephfs.test_volume_client.TestVolumeClient test failure (due to "src/tcmalloc.cc:283] Attempt to free invalid pointer") to cephfs.test_volume_client.TestVolumeClient failure (due to "src/tcmalloc.cc:283] Attempt to free invalid pointer")
Updated by Ramana Raja almost 8 years ago
- Subject changed from cephfs.test_volume_client.TestVolumeClient failure (due to "src/tcmalloc.cc:283] Attempt to free invalid pointer") to cephfs.test_volume_client.TestVolumeClient failure (with "src/tcmalloc.cc:283] Attempt to free invalid pointer")
Updated by Ramana Raja almost 8 years ago
The above issue is reproducible with Ceph master
(HEAD commit bf58aeb6d5a6a07d83374c750d177a78ce027c3c) built using
cmake.
However, I don't run into this crash when using autotools to build
Ceph.
Updated by John Spray almost 8 years ago
- Assignee set to Kefu Chai
Kefu has offered to look at this as a possible link issue (thanks Kefu!)
Updated by Kefu Chai almost 8 years ago
- Status changed from New to Fix Under Review
Updated by Kefu Chai almost 8 years ago
- Project changed from CephFS to Ceph
- Category deleted (
Testing) - Status changed from Fix Under Review to In Progress
moving to ceph subproject, as autotools build passes.
Updated by John Spray almost 8 years ago
Reproduce with a script like this:
from ceph_volume_client import CephFSVolumeClient, VolumePath import logging log = logging.getLogger("ceph_volume_client") log.addHandler(logging.StreamHandler()) log.setLevel(logging.DEBUG) vc = CephFSVolumeClient("admin", "./ceph.conf", "ceph", "_vols", "_ns") vc.connect() vp = VolumePath("grp", "vol") vc.create_volume(vp, 10, data_isolated=False) vc.disconnect()
Which I run like this:
LD_LIBRARY_PATH=`pwd`/lib PYTHONPATH=`pwd`/../src/pybind:`pwd`/lib/cython_modules/lib.linux-x86_64-2.7/ python ./foo.py
from my cmake build/ directory with a vstart cluster.
Updated by Kefu Chai almost 8 years ago
in ceph_volume_client.py, we have
from contextlib import contextmanager
import errno
import fcntl
import json
import logging
import os
import re
import struct
import sys
import threading
import time
import uuid
from ceph_argparse import json_command
import cephfs
import rados
it happens when importing the second extension: "rados".
Updated by Kefu Chai almost 8 years ago
reverting https://github.com/ceph/ceph/pull/10145 fixes this issue.
Updated by Kefu Chai almost 8 years ago
backtrace with libc allocator.
(gdb) bt #0 0x00007ffff6f271c8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007ffff6f2864a in __GI_abort () at abort.c:89 #2 0x00007ffff6f62f4a in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff705bb30 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007ffff6f686b6 in malloc_printerr (action=3, str=0x7ffff705bc88 "double free or corruption (fasttop)", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5004 #4 0x00007ffff6f68e9e in _int_free (av=0x7ffff728fb20 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3865 #5 0x00007ffff6f29b78 in __run_exit_handlers (status=0, listp=0x7ffff728f5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82 #6 0x00007ffff6f29bc5 in __GI_exit (status=<optimized out>) at exit.c:104 #7 0x00007ffff6f14737 in __libc_start_main (main=0x49d710 <main>, argc=2, argv=0x7fffffffe0a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe098) at ../csu/libc-start.c:325 #8 0x000000000049d639 in _start ()
Updated by Kefu Chai almost 8 years ago
found it. it's in libcommon, to be specific, it is
const string WorkerPool::name = "AsyncMessenger::WorkerPool";
Updated by Kefu Chai almost 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Kefu Chai almost 8 years ago
- Status changed from Fix Under Review to Resolved
Updated by Jeff Layton over 7 years ago
I believe I hit this as well on my own branch (wip-jlayton-nlink), which has commit 3173d2ea1. Test results are here:
http://qa-proxy.ceph.com/teuthology/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334949/
Interestingly:
$ file 1469612731.18395.core
1469612731.18395.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), too many program header sections (278)
...so I'm not quite sure what program crashed there.
Updated by Jeff Layton over 7 years ago
Ahh, my home box has a more up to date "file" program:
$ file /tmp/1469612731.18395.core
/tmp/1469612731.18395.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'python'
Here's the stack trace I see in the core (missing a lot of symbols of course):
(gdb) bt #0 0x00007f91a62dfcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f91a62e30d8 in __GI_abort () at abort.c:89 #2 0x00007f919ac79114 in tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem) () from /usr/lib/libtcmalloc.so.4 #3 0x00007f919ac6e184 in ?? () from /usr/lib/libtcmalloc.so.4 #4 0x00007f919ac8cdf4 in tc_delete () from /usr/lib/libtcmalloc.so.4 #5 0x00007f919b8d8da6 in ?? () #6 0x0000000000d20b08 in ?? () #7 0x00007fff5ae2caa0 in ?? () #8 0x0000000000e2cb80 in ?? () #9 0x00000000016ea060 in ?? () #10 0x00007f9199e7af18 in vtable for std::ostream () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #11 0x00007f9199e78f50 in vtable for std::ios_base () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #12 0x0000000000000006 in ?? () #13 0x0000000000000000 in ?? () (gdb) info libraries Undefined info command: "libraries". Try "help info". (gdb) info sharedlibrary From To Syms Read Shared Object Library 0x00007f91a66739f0 0x00007f91a66804a1 Yes /lib/x86_64-linux-gnu/libpthread.so.0 0x00007f91a62c84a0 0x00007f91a640d583 Yes /lib/x86_64-linux-gnu/libc.so.6 0x00007f91a60a5ed0 0x00007f91a60a69ce Yes /lib/x86_64-linux-gnu/libdl.so.2 0x00007f91a5ea2f10 0x00007f91a5ea3804 Yes /lib/x86_64-linux-gnu/libutil.so.1 0x00007f91a5c8ae00 0x00007f91a5c9abf8 Yes (*) /lib/x86_64-linux-gnu/libz.so.1 0x00007f91a5988610 0x00007f91a59f71b6 Yes /lib/x86_64-linux-gnu/libm.so.6 0x00007f91a688cae0 0x00007f91a68a74e0 Yes /lib64/ld-linux-x86-64.so.2 0x00007f91a54aef70 0x00007f91a54b7612 Yes (*) /usr/lib/python2.7/lib-dynload/_json.x86_64-linux-gnu.so 0x00007f91a5212f90 0x00007f91a5223642 Yes (*) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so 0x00007f91a5005650 0x00007f91a5009d38 Yes (*) /usr/lib/x86_64-linux-gnu/libffi.so.6 0x00007f91a4dc0610 0x00007f91a4dc1c96 Yes (*) /usr/lib/python2.7/lib-dynload/_hashlib.x86_64-linux-gnu.so 0x00007f91a4a42e00 0x00007f91a4b32cdf Yes (*) /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 0x00007f91a47e0660 0x00007f91a47e1d17 Yes (*) /lib/x86_64-linux-gnu/libuuid.so.1 0x00007f91a45d9470 0x00007f91a45dc935 Yes (*) /usr/lib/python2.7/lib-dynload/_ssl.x86_64-linux-gnu.so 0x00007f91a4389e80 0x00007f91a43bd66d Yes (*) /lib/x86_64-linux-gnu/libssl.so.1.0.0 No /usr/lib/python2.7/dist-packages/cephfs.x86_64-linux-gnu.so No /usr/lib/libcephfs.so.1 0x00007f919b1190c0 0x00007f919b2a7782 Yes (*) /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 0x00007f919aec7350 0x00007f919aeca33c Yes /lib/x86_64-linux-gnu/librt.so.1 0x00007f919ac6c9e0 0x00007f919ac8ad9c Yes (*) /usr/lib/libtcmalloc.so.4 0x00007f919aa42ad0 0x00007f919aa51e69 Yes /lib/x86_64-linux-gnu/libresolv.so.2 0x00007f919a71aa10 0x00007f919a7fb562 Yes (*) /usr/lib/x86_64-linux-gnu/libnss3.so 0x00007f919a4d36f0 0x00007f919a4f29bf Yes (*) /usr/lib/x86_64-linux-gnu/libnspr4.so 0x00007f919a2bcf40 0x00007f919a2c2883 Yes (*) /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0 0x00007f919a0af250 0x00007f919a0afdc3 Yes (*) /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 0x00007f9199ea1710 0x00007f9199ea7d82 Yes (*) /usr/lib/x86_64-linux-gnu/libboost_iostreams.so.1.54.0 0x00007f9199bec620 0x00007f9199c4f803 Yes (*) /usr/lib/x86_64-linux-gnu/libstdc++.so.6 0x00007f919997dab0 0x00007f919998d985 Yes (*) /lib/x86_64-linux-gnu/libgcc_s.so.1 0x00007f9199761f30 0x00007f91997695c9 Yes (*) /usr/lib/x86_64-linux-gnu/libunwind.so.8 0x00007f919953f2e0 0x00007f919954da53 Yes (*) /usr/lib/x86_64-linux-gnu/libnssutil3.so 0x00007f91993303f0 0x00007f9199331ae6 Yes (*) /usr/lib/x86_64-linux-gnu/libplc4.so 0x00007f919912bf40 0x00007f919912ce78 Yes (*) /usr/lib/x86_64-linux-gnu/libplds4.so 0x00007f9198f1c3c0 0x00007f9198f2808f Yes (*) /lib/x86_64-linux-gnu/libbz2.so.1.0 0x00007f9198cfb0d0 0x00007f9198d0f24d Yes (*) /lib/x86_64-linux-gnu/liblzma.so.5 No /usr/lib/python2.7/dist-packages/rados.x86_64-linux-gnu.so No /usr/lib/librados.so.2 0x00007f918fe48580 0x00007f918fe4e273 Yes (*) /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 0x00007f918fc41bd0 0x00007f918fc43fbc Yes (*) /usr/lib/liburcu-bp.so.1 0x00007f918f9e9a60 0x00007f918f9f7bcc Yes (*) /usr/lib/python2.7/lib-dynload/datetime.x86_64-linux-gnu.so 0x00007f918ef662d0 0x00007f918ef91f3c Yes (*) /usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so 0x00007f918ecb2d60 0x00007f918ed3da0f Yes (*) /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 ---Type <return> to continue, or q <return> to quit--- 0x00007f918ea2d9c0 0x00007f918ea8232d Yes (*) /usr/lib/x86_64-linux-gnu/nss/libfreebl3.so (*): Shared library is missing debugging information. (gdb)
Updated by Ramana Raja over 7 years ago
Jeff, I couldn't reproduce this issue using your branch, wip-jlayton-nlink, in my local machine. were you able to reproduce the issue using http://tracker.ceph.com/issues/16686#note-7 in your local setup?