Bug #16832
closedlibcephfs failure at shutdown (Attempt to free invalid pointer)
0%
Description
My fs test run had 3 failures:
1) some binary segfaulted, but I don't seem to be able to track down the core to see what actually failed:
http://pulpito.ceph.com/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334899/
2) some failure in ansible:
http://pulpito.ceph.com/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334932/
3) failure in the test_volume_client:
http://pulpito.ceph.com/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334949/
The last one seems to have failed here:
2016-07-27T02:45:31.139 INFO:teuthology.orchestra.run.mira089.stderr:src/tcmalloc.cc:277] Attempt to free invalid pointer 0x17a8b10
2016-07-27T02:45:31.321 INFO:teuthology.orchestra.run.mira089.stderr:daemon-helper: command crashed with signal 6
That one may be due to the linking bug that jspray mentioned in our standup this morning?
Updated by David Galloway almost 8 years ago
1) some binary segfaulted, but I don't seem to be able to track down the core to see what actually failed:
<dgalloway> jlayton, hopefully this is it... if you take the teuthology.log URL for the segfaulted job and strip the teuthology.log bit, http://qa-proxy.ceph.com/teuthology/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334899/
<dgalloway> jlayton, remote -> mira028 -> coredump
<jlayton> got it -- thanks
2) some failure in ansible
Known issue: http://tracker.ceph.com/issues/16826
3) failure in the test_volume_client:
2016-07-27T02:45:32.722 INFO:teuthology.orchestra.run.mira064:Running: 'sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf' 2016-07-27T02:45:32.726 INFO:teuthology.orchestra.run.mira089:Running: 'sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf' 2016-07-27T02:45:32.818 INFO:teuthology.orchestra.run.mira064:Running: 'sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf' 2016-07-27T02:45:32.823 INFO:teuthology.orchestra.run.mira089:Running: 'sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf' 2016-07-27T02:45:32.880 INFO:tasks.cephfs_test_runner: 2016-07-27T02:45:32.881 INFO:tasks.cephfs_test_runner:====================================================================== 2016-07-27T02:45:32.881 INFO:tasks.cephfs_test_runner:ERROR: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) 2016-07-27T02:45:32.881 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2016-07-27T02:45:32.882 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2016-07-27T02:45:32.882 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_volume_client.py", line 386, in test_data_isolated 2016-07-27T02:45:32.882 INFO:tasks.cephfs_test_runner: guest_entity=guest_entity 2016-07-27T02:45:32.882 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_volume_client.py", line 38, in _volume_client_python 2016-07-27T02:45:32.883 INFO:tasks.cephfs_test_runner: """.format(payload=script, conf_path=client.config_path, vol_prefix=vol_prefix, ns_prefix=ns_prefix)) 2016-07-27T02:45:32.883 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/mount.py", line 134, in run_python 2016-07-27T02:45:32.883 INFO:tasks.cephfs_test_runner: p.wait() 2016-07-27T02:45:32.884 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 121, in wait 2016-07-27T02:45:32.884 INFO:tasks.cephfs_test_runner: label=self.label) 2016-07-27T02:45:32.884 INFO:tasks.cephfs_test_runner:CommandFailedError: Command failed on mira089 with status 1: 'sudo adjust-ulimits daemon-helper kill python -c \'\nfrom ceph_volume_client import CephFSVolumeClient, VolumePath\nimport logging\nlog = logging.getLogger("ceph_volume_client")\nlog.addHandler(logging.StreamHandler())\nlog.setLevel(logging.DEBUG)\nvc = CephFSVolumeClient("manila", "/etc/ceph/ceph.conf", "ceph", None, None)\nvc.connect()\n\nvp = VolumePath("grpid", "volid")\nvc.create_volume(vp, 10, data_isolated=True)\n\nvc.disconnect()\n \'' 2016-07-27T02:45:32.885 INFO:tasks.cephfs_test_runner: 2016-07-27T02:45:32.885 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2016-07-27T02:45:32.885 INFO:tasks.cephfs_test_runner:Ran 2 tests in 86.670s 2016-07-27T02:45:32.886 INFO:tasks.cephfs_test_runner: 2016-07-27T02:45:32.886 INFO:tasks.cephfs_test_runner:FAILED (errors=1) 2016-07-27T02:45:32.886 INFO:tasks.cephfs_test_runner: 2016-07-27T02:45:32.887 INFO:tasks.cephfs_test_runner:====================================================================== 2016-07-27T02:45:32.888 INFO:tasks.cephfs_test_runner:ERROR: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) 2016-07-27T02:45:32.889 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2016-07-27T02:45:32.889 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2016-07-27T02:45:32.890 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_volume_client.py", line 386, in test_data_isolated 2016-07-27T02:45:32.890 INFO:tasks.cephfs_test_runner: guest_entity=guest_entity 2016-07-27T02:45:32.891 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_volume_client.py", line 38, in _volume_client_python 2016-07-27T02:45:32.893 INFO:tasks.cephfs_test_runner: """.format(payload=script, conf_path=client.config_path, vol_prefix=vol_prefix, ns_prefix=ns_prefix)) 2016-07-27T02:45:32.895 INFO:tasks.cephfs_test_runner: File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/mount.py", line 134, in run_python 2016-07-27T02:45:32.896 INFO:tasks.cephfs_test_runner: p.wait() 2016-07-27T02:45:32.899 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 121, in wait 2016-07-27T02:45:32.901 INFO:tasks.cephfs_test_runner: label=self.label) 2016-07-27T02:45:32.907 INFO:tasks.cephfs_test_runner:CommandFailedError: Command failed on mira089 with status 1: 'sudo adjust-ulimits daemon-helper kill python -c \'\nfrom ceph_volume_client import CephFSVolumeClient, VolumePath\nimport logging\nlog = logging.getLogger("ceph_volume_client")\nlog.addHandler(logging.StreamHandler())\nlog.setLevel(logging.DEBUG)\nvc = CephFSVolumeClient("manila", "/etc/ceph/ceph.conf", "ceph", None, None)\nvc.connect()\n\nvp = VolumePath("grpid", "volid")\nvc.create_volume(vp, 10, data_isolated=True)\n\nvc.disconnect()\n \'' 2016-07-27T02:45:32.912 INFO:tasks.cephfs_test_runner: 2016-07-27T02:45:32.915 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 69, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs_test_runner.py", line 186, in task raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests))) RuntimeError: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) 2016-07-27T02:45:34.789 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/?q=c2d975b168fa4071849776dbf74350d7 Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 69, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs_test_runner.py", line 186, in task raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests))) RuntimeError: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) 2016-07-27T02:45:34.789 DEBUG:teuthology.run_tasks:Unwinding manager cephfs_test_runner 2016-07-27T02:45:35.242 DEBUG:teuthology.run_tasks:Unwinding manager ceph-fuse 2016-07-27T02:45:35.777 INFO:tasks.ceph_fuse:Unmounting ceph-fuse clients... 2016-07-27T02:45:35.777 DEBUG:teuthology.run_tasks:Unwinding manager ceph 2016-07-27T02:45:36.233 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph pg dump --format=json' 2016-07-27T02:45:36.567 INFO:teuthology.orchestra.run.mira089.stderr:dumped all in format json 2016-07-27T02:45:37.866 INFO:tasks.ceph:Scrubbing osd.0 2016-07-27T02:45:37.867 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd deep-scrub 0' 2016-07-27T02:45:38.070 INFO:teuthology.orchestra.run.mira089.stderr:osd.0 instructed to deep-scrub 2016-07-27T02:45:38.082 INFO:tasks.ceph:Scrubbing osd.1 2016-07-27T02:45:38.085 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd deep-scrub 1' 2016-07-27T02:45:38.284 INFO:teuthology.orchestra.run.mira089.stderr:osd.1 instructed to deep-scrub 2016-07-27T02:45:38.296 INFO:tasks.ceph:Scrubbing osd.2 2016-07-27T02:45:38.297 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd deep-scrub 2' 2016-07-27T02:45:38.515 INFO:teuthology.orchestra.run.mira089.stderr:osd.2 instructed to deep-scrub 2016-07-27T02:45:38.527 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph pg dump --format=json' 2016-07-27T02:45:38.769 INFO:teuthology.orchestra.run.mira089.stderr:dumped all in format json 2016-07-27T02:45:38.993 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 32, in nested yield vars File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph.py", line 1501, in task yield File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 69, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs_test_runner.py", line 186, in task raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests))) RuntimeError: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)
Updated by David Galloway almost 8 years ago
- Project changed from sepia to Ceph
- Assignee set to Jeff Layton
Updated by Jeff Layton almost 8 years ago
- Category set to 24
The other two test failures -- one was a segfault in ceph_test_libcephfs:
(gdb) bt
#0 0x00007f771c5aaa63 in lockdep_will_lock (name=0x2f9fad8 "CephContextServiceThread::_lock", id=0, force_backtrace=false) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/lockdep.cc:281
#1 0x00007f771c4458a0 in _will_lock (this=0x2f9fa08) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/Mutex.h:56
#2 Mutex::Lock (this=this@entry=0x2f9fa08, no_lockdep=no_lockdep@entry=false) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/Mutex.cc:95
#3 0x00007f771c571a79 in Locker (m=..., this=<synthetic pointer>) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/Mutex.h:115
#4 CephContextServiceThread::entry (this=0x2f9f9e0) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/ceph_context.cc:95
#5 0x00007f771357d184 in start_thread (arg=0x7f770eb17700) at pthread_create.c:312
#6 0x00007f7712b8837d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
jspray says that it's likely a dup of http://tracker.ceph.com/issues/16556.
The other test failure is here. One is this:
2016-07-27T02:45:29.690 INFO:teuthology.orchestra.run.mira089.stderr:Disconnecting cephfs...
2016-07-27T02:45:31.139 INFO:teuthology.orchestra.run.mira089.stderr:src/tcmalloc.cc:277] Attempt to free invalid pointer 0x17a8b10
2016-07-27T02:45:31.321 INFO:teuthology.orchestra.run.mira089.stderr:daemon-helper: command crashed with signal 6
...which is supposed to have been fixed by 3173d2ea1 (cmake: remove util.cc from lib{rados,cephfs},ceph-objectstore-tool), but apparently isn't since this branch had that patch. So at this point, I'll repurpose this bug to track that.
Updated by Jeff Layton almost 8 years ago
- Subject changed from help diagnosing test failure to libcephfs failure at shutdown (Attempt to free invalid pointer)
Updated by Nathan Cutler over 7 years ago
@Jeff Lee: Just in case you don't know it yet, here is a trick for rescheduling failed and dead jobs from a previous run (useful for determining if the failures are reproducible): http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_run_integration_and_upgrade_tests#Re-scheduling-failed-or-dead-jobs-from-an-existing-suite
Updated by Jeff Layton over 7 years ago
- Status changed from New to Resolved
I haven't seen this in the latest test runs, so I'm going to go ahead and close this out under the assumption that is has been fixed.
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
24)