Project

General

Profile

Actions

Bug #8156

closed

Crash in Thread.cc "common/Thread.cc: 110: FAILED assert(ret == 0)" in upgrade:dumpling-x:stress-split-firefly-distro-basic-vps suite

Added by Yuri Weinstein about 10 years ago. Updated almost 10 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-04-17_20:35:01-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/199425/

2014-04-18T00:26:53.925 INFO:teuthology.orchestra.run.out:[10.214.138.91]: successfully deleted pool unique_pool_0
2014-04-18T00:26:53.929 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2014-04-18T00:26:53.929 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2014-04-18T00:26:53.929 DEBUG:teuthology.run_tasks:Unwinding manager thrashosds
2014-04-18T00:26:53.929 INFO:teuthology.task.thrashosds:joining thrashosds
2014-04-18T00:26:54.118 INFO:teuthology.task.thrashosds.thrasher:inning osd
2014-04-18T00:26:54.119 INFO:teuthology.task.thrashosds.thrasher:Adding osd 2
2014-04-18T00:26:54.119 DEBUG:teuthology.orchestra.run:Running [10.214.138.91]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd in 2'
2014-04-18T00:27:10.746 INFO:teuthology.orchestra.run.err:[10.214.138.91]: Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fcf27d20700 time 2014-04-18 07:27:10.485014
2014-04-18T00:27:10.746 INFO:teuthology.orchestra.run.err:[10.214.138.91]: common/Thread.cc: 110: FAILED assert(ret == 0)
2014-04-18T00:27:10.746 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  ceph version 0.79-238-gdea7011 (dea701125d78c78e89b8d47092052a1d9d300f4d)
2014-04-18T00:27:10.746 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  1: (Thread::create(unsigned long)+0x8a) [0x7fcf291ddcea]
2014-04-18T00:27:10.746 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  2: (AdminSocket::init(std::string const&)+0x876) [0x7fcf291a05f6]
2014-04-18T00:27:10.746 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  3: (common_init_finish(CephContext*, int)+0x58) [0x7fcf291ce548]
2014-04-18T00:27:10.746 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  4: (librados::RadosClient::connect()+0x1f) [0x7fcf29117a1f]
2014-04-18T00:27:10.747 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  5: (ffi_call_unix64()+0x4c) [0x7fcf2ad54adc]
2014-04-18T00:27:10.747 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  6: (ffi_call()+0x1fc) [0x7fcf2ad5440c]
2014-04-18T00:27:10.747 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  7: (_ctypes_callproc()+0x48e) [0x7fcf2af6b5fe]
2014-04-18T00:27:10.747 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  8: (()+0x15f9e) [0x7fcf2af6cf9e]
2014-04-18T00:27:10.747 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  9: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
2014-04-18T00:27:10.747 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  10: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
2014-04-18T00:27:10.747 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  11: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
2014-04-18T00:27:10.747 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  12: /usr/bin/python() [0x56d0aa]
2014-04-18T00:27:10.748 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  13: /usr/bin/python() [0x4d9854]
2014-04-18T00:27:10.748 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  14: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
2014-04-18T00:27:10.748 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  15: /usr/bin/python() [0x5872b2]
2014-04-18T00:27:10.748 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  16: (()+0x8182) [0x7fcf2c3c3182]
2014-04-18T00:27:10.748 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  17: (clone()+0x6d) [0x7fcf2c0f030d]
2014-04-18T00:27:10.748 INFO:teuthology.orchestra.run.err:[10.214.138.91]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-04-18T00:27:10.810 INFO:teuthology.orchestra.run.err:[10.214.138.91]: terminate called after throwing an instance of 'ceph::FailedAssertion'
2014-04-18T00:27:11.924 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/run_tasks.py", line 92, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/thrashosds.py", line 172, in task
    thrash_proc.do_join()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph_manager.py", line 153, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 331, in get
    raise self._exception
CommandCrashedError: Command crashed: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd in 2'
2014-04-18T00:27:12.029 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2014-04-18T00:27:12.029 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade
2014-04-18T00:27:12.029 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2014-04-18T00:27:12.029 DEBUG:teuthology.orchestra.run:Running [10.214.138.145]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json'
2014-04-18T00:27:13.645 INFO:teuthology.orchestra.run.err:[10.214.138.145]: dumped all in format json
2014-04-18T00:27:14.696 INFO:teuthology.task.ceph:Scrubbing osd osd.3
archive_path: /var/lib/teuthworker/archive/teuthology-2014-04-17_20:35:01-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/199425
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rados_api_tests.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
email: null
job_id: '199425'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-04-17_20:35:01-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: dea701125d78c78e89b8d47092052a1d9d300f4d
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: dea701125d78c78e89b8d47092052a1d9d300f4d
  s3tests:
    branch: master
  workunit:
    sha1: dea701125d78c78e89b8d47092052a1d9d300f4d
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
targets:
  ubuntu@vpm060.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9xRVZU0fFbIa9fa5PHGVQ6Uv4ClHM1xUES2g5b0/2luXiARQupaQLjhSPctudc2ZaH+dN4n0sZ9c6TYZsMDHNgtvFTuXJeBWeqdopjbAZRx0qj4QoYROM07OfjRhNgkTDq2rulymTDNlL7KrR6wNARCI94EqAkOwaBqKAy7Z4AM90/78ehXLx2Jm/DtCo/npgEXcFqBiYgZw7cKR01/an/pTox16XvlMxtWz7HmlCxOLWyz1ISKFvQYKzVK64ON2NA2eESRlacrww2gyOkQBsa0zRitI2qJ0A8WMsJL9ZQYLxTmPrxbxSDLwJc/qK+FE0/LFP4lDNXI2wmSHg+weL
  ubuntu@vpm061.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWwaP/NAbmaFHW4LiIEfCQZqnXz2lTc840AeankDiBlVjPVZqcgHpRitLb1SORfR3sM1R7Lgj56pOkAvBUrKGJeArOmH/4DdBI5ytz/GBey6UJQtDBzecSMMp9YbfCwYiguFzX97DL6gpU/7lB8X+VMruaBOv0vHQlMFp2v90PZyAtXwYzJ/p1Z2EWzybilAtpjez5eYUcv0DywN+O5v9oaFZXPv3/sYb2HQpKULmjxszFOrNa7ky2VYtr3rcnKWrQbx0ntjUgRKIRA/0HYDt+v0BSTicJJf44G4PsX+NridoHUHi8pNG7ykiiCzbeG3pVgXqaqi0ZD1EOX6ESM/fz
  ubuntu@vpm062.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCzS9G/WKOqpqr6HIkBbZr7xs2au3AapnhK4AXl+gs4WhmX+FIIdocnOWZubYV7IEvEUsm4eesZrUGpx+kLmuV81Uksk6yXXN+SNCpCilQexEYIcFhqLL8f0Sf7NTp4ea6amfOkWfe5aNNRmMnrLR5ruLkO801cQZh6nGqQzLe8HIfXg6VmbMEK3ATps4voietDSh3CnQwQ61gFzj5sCgbm4W6n8i98zdEf8kEtYS5K7CxeRwD6+il1i6EkJ5gdgzz27a4Xcp2v1iMGwF/6W7Te/nlQbf5wDjxIRlqYQNi8+SbHgktxKyhKddc5PDZ9oe/15CsskCRd+P5N6+gzTlRR
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0:
      idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.30592
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rados_api_tests.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
duration: 13871.453804969788
failure_reason: 'Command crashed: ''adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage
  ceph osd in 2'''
flavor: basic
owner: scheduled_teuthology@teuthology
success: false
Actions #1

Updated by Sage Weil about 10 years ago

  • Status changed from New to Rejected

this appears to be a simple out of memory (it's a few lines further up in the teuthology log). we need more memory on the VMs, or there is a memory leak somewhere.

Actions #2

Updated by Yuri Weinstein almost 10 years ago

See #8214

Actions

Also available in: Atom PDF