Project

General

Profile

Actions

Bug #16832

closed

libcephfs failure at shutdown (Attempt to free invalid pointer)

Added by Jeff Layton almost 8 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My fs test run had 3 failures:

1) some binary segfaulted, but I don't seem to be able to track down the core to see what actually failed:

http://pulpito.ceph.com/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334899/

2) some failure in ansible:

http://pulpito.ceph.com/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334932/

3) failure in the test_volume_client:

http://pulpito.ceph.com/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334949/

The last one seems to have failed here:

2016-07-27T02:45:31.139 INFO:teuthology.orchestra.run.mira089.stderr:src/tcmalloc.cc:277] Attempt to free invalid pointer 0x17a8b10
2016-07-27T02:45:31.321 INFO:teuthology.orchestra.run.mira089.stderr:daemon-helper: command crashed with signal 6

That one may be due to the linking bug that jspray mentioned in our standup this morning?

Actions #1

Updated by David Galloway almost 8 years ago

1) some binary segfaulted, but I don't seem to be able to track down the core to see what actually failed:

<dgalloway> jlayton, hopefully this is it... if you take the teuthology.log URL for the segfaulted job and strip the teuthology.log bit, http://qa-proxy.ceph.com/teuthology/jlayton-2016-07-26_12:03:12-fs-wip-jlayton-nlink---basic-mira/334899/
<dgalloway> jlayton, remote -> mira028 -> coredump
<jlayton> got it -- thanks

2) some failure in ansible

Known issue: http://tracker.ceph.com/issues/16826

3) failure in the test_volume_client:

2016-07-27T02:45:32.722 INFO:teuthology.orchestra.run.mira064:Running: 'sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf'
2016-07-27T02:45:32.726 INFO:teuthology.orchestra.run.mira089:Running: 'sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf'
2016-07-27T02:45:32.818 INFO:teuthology.orchestra.run.mira064:Running: 'sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf'
2016-07-27T02:45:32.823 INFO:teuthology.orchestra.run.mira089:Running: 'sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf'
2016-07-27T02:45:32.880 INFO:tasks.cephfs_test_runner:
2016-07-27T02:45:32.881 INFO:tasks.cephfs_test_runner:======================================================================
2016-07-27T02:45:32.881 INFO:tasks.cephfs_test_runner:ERROR: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)
2016-07-27T02:45:32.881 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2016-07-27T02:45:32.882 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2016-07-27T02:45:32.882 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_volume_client.py", line 386, in test_data_isolated
2016-07-27T02:45:32.882 INFO:tasks.cephfs_test_runner:    guest_entity=guest_entity
2016-07-27T02:45:32.882 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_volume_client.py", line 38, in _volume_client_python
2016-07-27T02:45:32.883 INFO:tasks.cephfs_test_runner:    """.format(payload=script, conf_path=client.config_path, vol_prefix=vol_prefix, ns_prefix=ns_prefix))
2016-07-27T02:45:32.883 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/mount.py", line 134, in run_python
2016-07-27T02:45:32.883 INFO:tasks.cephfs_test_runner:    p.wait()
2016-07-27T02:45:32.884 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 121, in wait
2016-07-27T02:45:32.884 INFO:tasks.cephfs_test_runner:    label=self.label)
2016-07-27T02:45:32.884 INFO:tasks.cephfs_test_runner:CommandFailedError: Command failed on mira089 with status 1: 'sudo adjust-ulimits daemon-helper kill python -c \'\nfrom ceph_volume_client import CephFSVolumeClient, VolumePath\nimport logging\nlog = logging.getLogger("ceph_volume_client")\nlog.addHandler(logging.StreamHandler())\nlog.setLevel(logging.DEBUG)\nvc = CephFSVolumeClient("manila", "/etc/ceph/ceph.conf", "ceph", None, None)\nvc.connect()\n\nvp = VolumePath("grpid", "volid")\nvc.create_volume(vp, 10, data_isolated=True)\n\nvc.disconnect()\n        \''
2016-07-27T02:45:32.885 INFO:tasks.cephfs_test_runner:
2016-07-27T02:45:32.885 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2016-07-27T02:45:32.885 INFO:tasks.cephfs_test_runner:Ran 2 tests in 86.670s
2016-07-27T02:45:32.886 INFO:tasks.cephfs_test_runner:
2016-07-27T02:45:32.886 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2016-07-27T02:45:32.886 INFO:tasks.cephfs_test_runner:
2016-07-27T02:45:32.887 INFO:tasks.cephfs_test_runner:======================================================================
2016-07-27T02:45:32.888 INFO:tasks.cephfs_test_runner:ERROR: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)
2016-07-27T02:45:32.889 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2016-07-27T02:45:32.889 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2016-07-27T02:45:32.890 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_volume_client.py", line 386, in test_data_isolated
2016-07-27T02:45:32.890 INFO:tasks.cephfs_test_runner:    guest_entity=guest_entity
2016-07-27T02:45:32.891 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_volume_client.py", line 38, in _volume_client_python
2016-07-27T02:45:32.893 INFO:tasks.cephfs_test_runner:    """.format(payload=script, conf_path=client.config_path, vol_prefix=vol_prefix, ns_prefix=ns_prefix))
2016-07-27T02:45:32.895 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/mount.py", line 134, in run_python
2016-07-27T02:45:32.896 INFO:tasks.cephfs_test_runner:    p.wait()
2016-07-27T02:45:32.899 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 121, in wait
2016-07-27T02:45:32.901 INFO:tasks.cephfs_test_runner:    label=self.label)
2016-07-27T02:45:32.907 INFO:tasks.cephfs_test_runner:CommandFailedError: Command failed on mira089 with status 1: 'sudo adjust-ulimits daemon-helper kill python -c \'\nfrom ceph_volume_client import CephFSVolumeClient, VolumePath\nimport logging\nlog = logging.getLogger("ceph_volume_client")\nlog.addHandler(logging.StreamHandler())\nlog.setLevel(logging.DEBUG)\nvc = CephFSVolumeClient("manila", "/etc/ceph/ceph.conf", "ceph", None, None)\nvc.connect()\n\nvp = VolumePath("grpid", "volid")\nvc.create_volume(vp, 10, data_isolated=True)\n\nvc.disconnect()\n        \''
2016-07-27T02:45:32.912 INFO:tasks.cephfs_test_runner:
2016-07-27T02:45:32.915 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 69, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs_test_runner.py", line 186, in task
    raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)
2016-07-27T02:45:34.789 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/?q=c2d975b168fa4071849776dbf74350d7
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 69, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs_test_runner.py", line 186, in task
    raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)
2016-07-27T02:45:34.789 DEBUG:teuthology.run_tasks:Unwinding manager cephfs_test_runner
2016-07-27T02:45:35.242 DEBUG:teuthology.run_tasks:Unwinding manager ceph-fuse
2016-07-27T02:45:35.777 INFO:tasks.ceph_fuse:Unmounting ceph-fuse clients...
2016-07-27T02:45:35.777 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2016-07-27T02:45:36.233 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph pg dump --format=json'
2016-07-27T02:45:36.567 INFO:teuthology.orchestra.run.mira089.stderr:dumped all in format json
2016-07-27T02:45:37.866 INFO:tasks.ceph:Scrubbing osd.0
2016-07-27T02:45:37.867 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd deep-scrub 0'
2016-07-27T02:45:38.070 INFO:teuthology.orchestra.run.mira089.stderr:osd.0 instructed to deep-scrub
2016-07-27T02:45:38.082 INFO:tasks.ceph:Scrubbing osd.1
2016-07-27T02:45:38.085 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd deep-scrub 1'
2016-07-27T02:45:38.284 INFO:teuthology.orchestra.run.mira089.stderr:osd.1 instructed to deep-scrub
2016-07-27T02:45:38.296 INFO:tasks.ceph:Scrubbing osd.2
2016-07-27T02:45:38.297 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd deep-scrub 2'
2016-07-27T02:45:38.515 INFO:teuthology.orchestra.run.mira089.stderr:osd.2 instructed to deep-scrub
2016-07-27T02:45:38.527 INFO:teuthology.orchestra.run.mira089:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph pg dump --format=json'
2016-07-27T02:45:38.769 INFO:teuthology.orchestra.run.mira089.stderr:dumped all in format json
2016-07-27T02:45:38.993 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 32, in nested
    yield vars
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph.py", line 1501, in task
    yield
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 69, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs_test_runner.py", line 186, in task
    raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient)
Actions #2

Updated by David Galloway almost 8 years ago

  • Project changed from sepia to Ceph
  • Assignee set to Jeff Layton
Actions #3

Updated by Jeff Layton almost 8 years ago

  • Category set to 24

The other two test failures -- one was a segfault in ceph_test_libcephfs:

(gdb) bt
#0 0x00007f771c5aaa63 in lockdep_will_lock (name=0x2f9fad8 "CephContextServiceThread::_lock", id=0, force_backtrace=false) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/lockdep.cc:281
#1 0x00007f771c4458a0 in _will_lock (this=0x2f9fa08) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/Mutex.h:56
#2 Mutex::Lock (this=this@entry=0x2f9fa08, no_lockdep=no_lockdep@entry=false) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/Mutex.cc:95
#3 0x00007f771c571a79 in Locker (m=..., this=<synthetic pointer>) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/Mutex.h:115
#4 CephContextServiceThread::entry (this=0x2f9f9e0) at /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-924-g438b09a/src/common/ceph_context.cc:95
#5 0x00007f771357d184 in start_thread (arg=0x7f770eb17700) at pthread_create.c:312
#6 0x00007f7712b8837d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

jspray says that it's likely a dup of http://tracker.ceph.com/issues/16556.

The other test failure is here. One is this:

2016-07-27T02:45:29.690 INFO:teuthology.orchestra.run.mira089.stderr:Disconnecting cephfs...
2016-07-27T02:45:31.139 INFO:teuthology.orchestra.run.mira089.stderr:src/tcmalloc.cc:277] Attempt to free invalid pointer 0x17a8b10
2016-07-27T02:45:31.321 INFO:teuthology.orchestra.run.mira089.stderr:daemon-helper: command crashed with signal 6

...which is supposed to have been fixed by 3173d2ea1 (cmake: remove util.cc from lib{rados,cephfs},ceph-objectstore-tool), but apparently isn't since this branch had that patch. So at this point, I'll repurpose this bug to track that.

Actions #4

Updated by Jeff Layton almost 8 years ago

  • Subject changed from help diagnosing test failure to libcephfs failure at shutdown (Attempt to free invalid pointer)
Actions #5

Updated by Nathan Cutler over 7 years ago

@Jeff Lee: Just in case you don't know it yet, here is a trick for rescheduling failed and dead jobs from a previous run (useful for determining if the failures are reproducible): http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_run_integration_and_upgrade_tests#Re-scheduling-failed-or-dead-jobs-from-an-existing-suite

Actions #6

Updated by Jeff Layton over 7 years ago

  • Status changed from New to Resolved

I haven't seen this in the latest test runs, so I'm going to go ahead and close this out under the assumption that is has been fixed.

Actions #7

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (24)
Actions

Also available in: Atom PDF