Project

General

Profile

Actions

Bug #9515

closed

"Segmentation fault (ceph_test_rados_api_io)" in upgrade:dumpling-giant-x:parallel-giant-distro-basic-vps run

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-17_13:53:14-upgrade:dumpling-giant-x:parallel-giant-distro-basic-vps/493053/

2014-09-17T15:00:26.755 INFO:tasks.workunit.client.0.vpm102.stdout:[       OK ] LibRadosIo.SimpleWrite (1122 ms)
2014-09-17T15:00:26.755 INFO:tasks.workunit.client.0.vpm102.stdout:[ RUN      ] LibRadosIo.ReadTimeout
2014-09-17T15:00:26.839 INFO:tasks.workunit.client.0.vpm102.stderr:/home/ubuntu/cephtest/workunit.client.0/rados/test.sh: line 4: 10785 Segmentation fault      (core dumped) ceph_test_rados_api_io
2014-09-17T15:00:26.840 INFO:tasks.workunit:Stopping ['rados/test.sh', 'cls'] on client.0...
archive_path: /var/lib/teuthworker/archive/teuthology-2014-09-17_13:53:14-upgrade:dumpling-giant-x:parallel-giant-distro-basic-vps/493053
branch: giant
description: upgrade:dumpling-giant-x:parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-giant-upgrade/giant.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml
  test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml 6-final-workload/{ec-rados-default.yaml
  ec-rados-plugin=jerasure-k=3-m=1.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/centos_6.5.yaml}
email: ceph-qa@ceph.com
job_id: '493053'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-09-17_13:53:14-upgrade:dumpling-giant-x:parallel-giant-distro-basic-vps
nuke-on-error: true
os_type: centos
os_version: '6.5'
overrides:
  admin_socket:
    branch: giant
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - scrub mismatch
    - ScrubResult
    sha1: cef34f429972267061fc0e730ef976887ccb78a9
  ceph-deploy:
    branch:
      dev: giant
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: cef34f429972267061fc0e730ef976887ccb78a9
  s3tests:
    branch: giant
  workunit:
    sha1: cef34f429972267061fc0e730ef976887ccb78a9
owner: scheduled_yuriw
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
suite: upgrade:dumpling-giant-x:parallel
suite_branch: giant
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_giant
targets:
  ubuntu@vpm011.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr/IYvRtlEPWibZjdKXBbuFFR/5lZNXdVPDgh55OYnDPELm+M0gVi+jKUSc5uT4AUChQiefPjgPZJ30D2IdoPLKnG5YJ2QbZ4aNPkPiTkx7NhapYWSTtKejtO1m4kmS5b1R2inKt0Nc9NoPsY95v0HYjKfvAnD6mgXcOPujIqKo88/pjbN0FM8v4wYW+CsmpeTjn9eOtl7uX9gIlkn5zC/PshVnarf4MpPLkhgoMNPYFURvgsDxU5TCgWlN1jbVwczRkn90awHKgHA85tFAhNBNabao11A8Szt8Zpy4fbW8ObYZ1ljP3wWZZey6xqpw+Ci2RGc4c3Fx2eRmM2AC0Nzw==
  ubuntu@vpm077.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA7rFLQ1UrMvZ0SoEsklOuM9DgVAJTSe5rrPLCuOIC+w2+VCAzVtc3fQxC43yzXcIB8+0jOUA87z6gGNuah90BTbS4gpc+Vbf60UhR6/qq63VA/ecnOs4eXd1bCiWbFp1Ha8mOPHiF4CXLubEnZtdht3UWeHYEmabC8uwRCpp/p2JhYFeC4+xMD8fPlUpKMQbuMytGykYpbxt1L5e514z+UUZRAfjr+AqIwzza5jwHKlTZNn6SUWIWNUaFHLUkX7av0wZXcc5s9TwgiifzDbllEmK9GsTRNcFccJ6sOCnnR5G2vnBlxIyNEzCC6M8mc4RWIeHPscGVW4HN02PRpFq1uQ==
  ubuntu@vpm102.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAxl2peqVIvfw8CDzNwT+B3HSeLo3DesJK3Znuig21VDRJmMi4/RCg0WJL1eOJVwPLLSOzCS+DV+4xKJFo4t5ZoM8XZbbUWfvJUhGO0jUndotWU46XOsw8bYeBpylpjKOlRW/blSrtqkP8tFF0mAAjForH9AaVpyDfaAujzO7kSCD6yBL93FPRxV7fQGMxBpA+MR0HIRyQrH52Ke07DUYrHBKsWBpqvkXGj8AIbNyZMWqzUvdtnYZE2D2oef5lpX15qJFZzhxtseOlC6PaSI/nYa1IEOJ/CO9NXsuGTMdlAJ+e8w0+OO1k8FKJTx+kCIAOpQSLkcK6ESScubQw9lWxzQ==
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- print: '**** done dumpling install'
- ceph:
    fs: xfs
- parallel:
  - workload
- print: '**** done parallel'
- install.upgrade:
    client.0:
      branch: giant
    mon.a:
      branch: giant
    mon.b:
      branch: giant
- print: '**** done install.upgrade'
- ceph.restart: null
- print: '**** done restart'
- parallel:
  - workload2
  - upgrade-sequence
- print: '**** done parallel'
- install.upgrade:
    client.0: null
- print: '**** done install.upgrade client.0 to the version from teuthology-suite
    arg'
- rados:
    clients:
    - client.0
    ec_pool: true
    objects: 50
    op_weights:
      append: 100
      copy_from: 50
      delete: 50
      read: 100
      rmattr: 25
      rollback: 50
      setattr: 25
      snap_create: 50
      snap_remove: 50
      write: 0
    ops: 4000
- rados:
    clients:
    - client.0
    ec_pool: true
    erasure_code_profile:
      k: 3
      m: 1
      name: jerasure31profile
      plugin: jerasure
      ruleset-failure-domain: osd
      technique: reed_sol_van
    objects: 50
    op_weights:
      append: 100
      copy_from: 50
      delete: 50
      read: 100
      rmattr: 25
      rollback: 50
      setattr: 25
      snap_create: 50
      snap_remove: 50
      write: 0
    ops: 4000
- rados:
    clients:
    - client.1
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- workunit:
    clients:
      client.1:
      - rados/load-gen-mix.sh
- sequential:
  - mon_thrash:
      revive_delay: 20
      thrash_delay: 1
  - workunit:
      clients:
        client.1:
        - rados/test.sh
  - print: '**** done rados/test.sh - 6-final-workload'
- workunit:
    clients:
      client.1:
      - cls/test_cls_rbd.sh
- workunit:
    clients:
      client.1:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- rgw:
  - client.1
- s3tests:
    client.1:
      rgw_server: client.1
- swift:
    client.1:
      rgw_server: client.1
teuthology_branch: master
tube: vps
upgrade-sequence:
  sequential:
  - install.upgrade:
      mon.a: null
  - print: '**** done install.upgrade mon.a to the version from teuthology-suite arg'
  - install.upgrade:
      mon.b: null
  - print: '**** done install.upgrade mon.b to the version from teuthology-suite arg'
  - ceph.restart:
      daemons:
      - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
      daemons:
      - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - exec:
      mon.a:
      - ceph osd crush tunables firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.3489
workload:
  sequential:
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done rados/test.sh &  cls'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh'
workload2:
  sequential:
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done #rados/test.sh and cls 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh 2'
description: upgrade:dumpling-giant-x:parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-giant-upgrade/giant.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml
  test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml 6-final-workload/{ec-rados-default.yaml
  ec-rados-plugin=jerasure-k=3-m=1.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/centos_6.5.yaml}
duration: 3665.073081970215
failure_reason: 'Command failed on vpm102 with status 139: ''mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp
  && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1
  CEPH_REF=firefly TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" adjust-ulimits ceph-coverage
  /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.0/rados/test.sh'''
flavor: basic
owner: scheduled_yuriw
success: false

Related issues 2 (0 open2 closed)

Is duplicate of Ceph - Bug #9508: objecter: segv on timeout/cancel (LibRadosIo ReadTimeout)ResolvedJohn Spray09/17/2014

Actions
Is duplicate of Ceph - Bug #9582: librados: segmentation fault on timeoutResolvedSage Weil09/24/2014

Actions
Actions #3

Updated by Loïc Dachary over 9 years ago

The stack trace is:

#0  0x00007f541b95750a in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f541bd41341 in RWLock::get_write(bool) () from /usr/lib/librados.so.2
#2  0x00007f541bd2bbc9 in Objecter::op_cancel(Objecter::OSDSession*, unsigned long, int) () from /usr/lib/librados.so.2
#3  0x00007f541bcf1349 in Context::complete(int) () from /usr/lib/librados.so.2
#4  0x00007f541bdad5ea in RWTimer::timer_thread() () from /usr/lib/librados.so.2
#5  0x00007f541bdb149d in RWTimerThread::entry() () from /usr/lib/librados.so.2
#6  0x00007f541b953e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7  0x00007f541b16a3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x0000000000000000 in ?? ()

Actions #4

Updated by Loïc Dachary over 9 years ago

Seems to be related to http://tracker.ceph.com/issues/9508 and recently resolved

Actions #5

Updated by Loïc Dachary over 9 years ago

  • Status changed from New to Duplicate
Actions #6

Updated by Yuri Weinstein over 9 years ago

  • Status changed from Duplicate to New

Still see in http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-27_18:40:01-upgrade:dumpling-giant-x:parallel-giant-distro-basic-vps/515312/ run

2014-09-27T20:59:08.185 INFO:tasks.workunit.client.0.vpm164.stdout:[ RUN      ] LibRadosIo.SimpleWrite
2014-09-27T20:59:09.813 INFO:tasks.workunit.client.0.vpm164.stdout:[       OK ] LibRadosIo.SimpleWrite (1628 ms)
2014-09-27T20:59:09.813 INFO:tasks.workunit.client.0.vpm164.stdout:[ RUN      ] LibRadosIo.ReadTimeout
2014-09-27T20:59:17.075 INFO:tasks.workunit.client.0.vpm164.stderr:/home/ubuntu/cephtest/workunit.client.0/rados/test.sh: line 4: 18877 Segmentation fault      (core dumped) ceph_test_rados_api_io
2014-09-27T20:59:17.174 INFO:tasks.workunit:Stopping ['rados/test.sh', 'cls'] on client.0...
Actions #7

Updated by Samuel Just over 9 years ago

  • Status changed from New to Duplicate

Ok, hopefully fixed this time.

Actions

Also available in: Atom PDF