Bug #8214: Crash in Thread.cc "common/Thread.cc: 110: FAILED assert(ret == 0)" in upgrade:dumpling-x:stress-split-firefly-distro-basic-vps suite - Ceph - Ceph

Actions

Copy link

Bug #8214

closed

Crash in Thread.cc "common/Thread.cc: 110: FAILED assert(ret == 0)" in upgrade:dumpling-x:stress-split-firefly-distro-basic-vps suite

Added by Yuri Weinstein almost 10 years ago. Updated almost 10 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This look similar to #8156, but has no out of memory problem.

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-04-24_20:35:03-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/213371/

2014-04-25T06:33:52.251 DEBUG:teuthology.orchestra.run:Running [10.214.138.144]: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config set filestore_inject_stall 3'
2014-04-25T06:34:11.263 INFO:teuthology.task.thrashosds.thrasher:in_osds:  [5, 2, 3, 4, 0]  out_osds:  [1] dead_osds:  [] live_osds:  [5, 1, 0, 3, 4, 2]
2014-04-25T06:34:11.263 INFO:teuthology.task.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0
2014-04-25T06:34:11.264 INFO:teuthology.task.thrashosds.thrasher:Growing pool unique_pool_2
2014-04-25T06:34:11.264 DEBUG:teuthology.orchestra.run:Running [10.214.138.144]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format=json'
2014-04-25T06:34:17.736 INFO:teuthology.orchestra.run.err:[10.214.138.144]: dumped all in format json
2014-04-25T06:34:19.659 INFO:teuthology.task.thrashosds.ceph_manager:increase pool size by 10
2014-04-25T06:34:19.660 DEBUG:teuthology.orchestra.run:Running [10.214.138.144]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool set unique_pool_2 pg_num 96'
2014-04-25T06:34:27.951 INFO:teuthology.orchestra.run.err:[10.214.138.144]: set pool 139 pg_num to 96
2014-04-25T06:34:34.207 INFO:teuthology.task.thrashosds.thrasher:in_osds:  [5, 2, 3, 4, 0]  out_osds:  [1] dead_osds:  [] live_osds:  [5, 1, 0, 3, 4, 2]
2014-04-25T06:34:34.207 INFO:teuthology.task.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0
2014-04-25T06:34:34.207 INFO:teuthology.task.thrashosds.thrasher:Adding osd 1
2014-04-25T06:34:34.207 DEBUG:teuthology.orchestra.run:Running [10.214.138.144]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd in 1'
2014-04-25T06:35:24.476 INFO:teuthology.orchestra.run.err:[10.214.138.144]: Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f9537a71700 time 2014-04-25 13:35:24.323605
2014-04-25T06:35:24.476 INFO:teuthology.orchestra.run.err:[10.214.138.144]: common/Thread.cc: 110: FAILED assert(ret == 0)
2014-04-25T06:35:24.493 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  ceph version 0.80-rc1-16-g2708c3c (2708c3c559d99e6f3b557ee1d223efa3745f655c)
2014-04-25T06:35:24.494 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  1: (Thread::create(unsigned long)+0x8a) [0x7f9538f2f53a]
2014-04-25T06:35:24.494 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  2: (SafeTimer::init()+0x7f) [0x7f9538f0723f]
2014-04-25T06:35:24.494 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  3: (librados::RadosClient::connect()+0xcc2) [0x7f9538e69832]
2014-04-25T06:35:24.494 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  4: (ffi_call_unix64()+0x4c) [0x7f953aaa6adc]
2014-04-25T06:35:24.494 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  5: (ffi_call()+0x1fc) [0x7f953aaa640c]
2014-04-25T06:35:24.495 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  6: (_ctypes_callproc()+0x48e) [0x7f953acbd5fe]
2014-04-25T06:35:24.495 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  7: (()+0x15f9e) [0x7f953acbef9e]
2014-04-25T06:35:24.495 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  8: (PyEval_EvalFrameEx()+0x1f36) [0x52e1e6]
2014-04-25T06:35:24.495 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  9: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
2014-04-25T06:35:24.496 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  10: (PyEval_EvalFrameEx()+0xc82) [0x52cf32]
2014-04-25T06:35:24.496 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  11: /usr/bin/python() [0x56d0aa]
2014-04-25T06:35:24.496 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  12: /usr/bin/python() [0x4d9854]
2014-04-25T06:35:24.496 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  13: (PyEval_CallObjectWithKeywords()+0x6b) [0x4da20b]
2014-04-25T06:35:24.497 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  14: /usr/bin/python() [0x5872b2]
2014-04-25T06:35:24.497 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  15: (()+0x8182) [0x7f953c115182]
2014-04-25T06:35:24.497 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  16: (clone()+0x6d) [0x7f953be4230d]
2014-04-25T06:35:24.497 INFO:teuthology.orchestra.run.err:[10.214.138.144]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-04-25T06:35:24.498 INFO:teuthology.orchestra.run.err:[10.214.138.144]: terminate called after throwing an instance of 'ceph::FailedAssertion'
2014-04-25T06:36:36.953 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: finishing write tid 1 to vpm08810175-36
2014-04-25T06:36:36.960 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: finishing write tid 2 to vpm08810175-36
2014-04-25T06:36:41.252 INFO:teuthology.task.ceph.osd.0.err:[10.214.138.144]: daemon-helper: command crashed with signal 6
2014-04-25T06:37:05.450 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: finishing write tid 3 to vpm08810175-36
2014-04-25T06:37:07.245 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: 3266: oids not in use 500
2014-04-25T06:37:07.245 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: Snapping
2014-04-25T06:37:08.809 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: 3267: oids not in use 500
2014-04-25T06:37:08.809 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: Snapping
2014-04-25T06:37:10.823 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: 3268: oids not in use 500
2014-04-25T06:37:10.823 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: RollingBack 194 to 352
2014-04-25T06:37:10.928 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: 3269: oids not in use 500
2014-04-25T06:37:10.928 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: Snapping
2014-04-25T06:37:13.206 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: 3270: oids not in use 500
2014-04-25T06:37:13.206 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: Deleting 62 current snap is 359
2014-04-25T06:37:13.207 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: 3271: oids not in use 500
2014-04-25T06:37:13.207 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: Snapping
2014-04-25T06:37:15.519 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: 3272: oids not in use 500
2014-04-25T06:37:15.519 INFO:teuthology.task.rados.rados.0.out:[10.214.138.133]: Writing 217 current snap is 360

archive_path: /var/lib/teuthworker/archive/teuthology-2014-04-24_20:35:03-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/213371
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
email: null
job_id: '213371'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-04-24_20:35:03-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
  s3tests:
    branch: master
  workunit:
    sha1: 2708c3c559d99e6f3b557ee1d223efa3745f655c
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
targets:
  ubuntu@vpm086.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCtEDwzUe+qPsDo5zamTc3vPTySPm6Zdp0tF3N13+6P3uNCXChdAQQHm9UwcymTaZjHGr0ou1JKgxQ4bG/GxuOtB5XB0h84h/LZkDboYGzppzYXpxqOPA4fsa7wE/CkB1L1ECvaUbMbd/bu7etC9vX5BInL9fbmQZx/ZwUvJ5uPUAkVXpZHHFxwh0pxHNsWkv3Rjygg6A8PoD0z7ty2ZboKt7Dee5xGcohSI+cJCBXHiO176tI/JasB+ohmMTYqhlgBnvvcBlRnoyCN0b72v4CXPP8DT06QEPKk3ojnP2LNT/7LfxCMp7Yw4xpQ4dyttmhUAN3lxmTqxv3wZW5HxBV7
  ubuntu@vpm087.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCpNfpBMj6XnPPsR4+mIsfICXIjy26Kefpc+8hi9aoY3xpUQUo1OIaepwBFHA+0QnIv0W1OdSbiJm8f1y7JFD1HtVEx6RbxNibN9cr/GVTjEAr8eMwn4y0u6xdZ9JnwINxCkJt0PhbQzMn6x/mWf0M2GcJjF50nfUDBdDiLZ7EI0A6+iYPe5I6lrVM3qjXVrDPO4ox0Wn6P+f87/KOCHlSM5GEFuCOnD1QnEFPwwjV4uidYEbbDWtH+5mzPOWr09Z/WMXK6M/eLSb8cVbNlV+VkA4sJhWJ9URiMftBA+XiFv8ymw/p36bj9Pl3LS0jpG4uFL3EppbsBCLLzwiETjKqp
  ubuntu@vpm088.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPYGsveQQeLY6gsvgfOqJQLDW7oLHMGBKnFSRTPFRYay8Dza5GaAw8sqayeXKXYIjTS8JCj6stZncK7Enc9du70lJ14jtRQffLatpz4045MqE1S6sTNWKCFZjn8KNUVRsggum1Eb76+4oEt1q7y+a4ZETyTwKv6n8IRXvdk51XL77kZV5tw+Dpz+WwxK6ufERrxfyE2AbbC+USkLAThLonw2P8D2xf3ctlsjrFSYN5aPRD7kjl6H8AM9+ipMjaXR/0yY7oJrdaBuMLZhLeshu3L5IGChV/3uDCnFSDtchAs7a2VcSwtHPLlgsCzMXDvYbGu2Ttd2rg+vYCYwnGHnY5
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0:
      idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.30578

description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
duration: 16598.548552036285
failure_reason: 'Command crashed: ''adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage
  ceph osd in 1'''
flavor: basic
owner: scheduled_teuthology@teuthology
success: false