Project

General

Profile

Actions

Bug #9040

closed

clients can SEGV during package upgrade

Added by Yuri Weinstein over 9 years ago. Updated almost 3 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-06_16:30:35-upgrade:dumpling-dumpling---basic-vps/404328/

Lots other tests failed in http://pulpito.front.sepia.ceph.com/teuthology-2014-08-06_16:30:35-upgrade:dumpling-dumpling---basic-vps/ run

2014-08-06T18:39:17.846 INFO:tasks.workunit.client.0.vpm140.stderr:*** Caught signal (Segmentation fault) **
2014-08-06T18:39:17.846 INFO:tasks.workunit.client.0.vpm140.stderr: in thread 7fbf3234d700
2014-08-06T18:39:17.880 INFO:tasks.workunit.client.0.vpm140.stderr: ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
2014-08-06T18:39:17.881 INFO:tasks.workunit.client.0.vpm140.stderr: 1: rbd() [0x41ef9a]
2014-08-06T18:39:17.881 INFO:tasks.workunit.client.0.vpm140.stderr: 2: (()+0xfcb0) [0x7fbf37f4dcb0]
2014-08-06T18:39:17.881 INFO:tasks.workunit.client.0.vpm140.stderr: 3: (PerfCounters::tinc(int, utime_t)+0x28) [0x7fbf38344028]
2014-08-06T18:39:17.882 INFO:tasks.workunit.client.0.vpm140.stderr: 4: (librbd::AioCompletion::complete()+0x210) [0x7fbf38fd2610]
2014-08-06T18:39:17.882 INFO:tasks.workunit.client.0.vpm140.stderr: 5: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7fbf38fd1a67]
2014-08-06T18:39:17.883 INFO:tasks.workunit.client.0.vpm140.stderr: 6: (librbd::C_AioRead::finish(int)+0x104) [0x7fbf38fd1df4]
2014-08-06T18:39:17.883 INFO:tasks.workunit.client.0.vpm140.stderr: 7: (Context::complete(int)+0xa) [0x7fbf38fd228a]
2014-08-06T18:39:17.884 INFO:tasks.workunit.client.0.vpm140.stderr: 8: (librbd::rados_req_cb(void*, void*)+0x47) [0x7fbf38fe5f77]
2014-08-06T18:39:17.884 INFO:tasks.workunit.client.0.vpm140.stderr: 9: (librados::C_AioComplete::finish(int)+0x1d) [0x7fbf382e3a5d]
2014-08-06T18:39:17.884 INFO:tasks.workunit.client.0.vpm140.stderr: 10: (Context::complete(int)+0xa) [0x7fbf382c450a]
2014-08-06T18:39:17.885 INFO:tasks.workunit.client.0.vpm140.stderr: 11: (Finisher::finisher_thread_entry()+0x1c0) [0x7fbf38367d20]
2014-08-06T18:39:17.885 INFO:tasks.workunit.client.0.vpm140.stderr: 12: (()+0x7e9a) [0x7fbf37f45e9a]
2014-08-06T18:39:17.886 INFO:tasks.workunit.client.0.vpm140.stderr: 13: (clone()+0x6d) [0x7fbf3755873d]
2014-08-06T18:39:17.886 INFO:tasks.workunit.client.0.vpm140.stderr:2014-08-07 01:39:17.877154 7fbf3234d700 -1 *** Caught signal (Segmentation fault) **
2014-08-06T18:39:17.887 INFO:tasks.workunit.client.0.vpm140.stderr: in thread 7fbf3234d700
2014-08-06T18:39:17.887 INFO:tasks.workunit.client.0.vpm140.stderr:
2014-08-06T18:39:17.888 INFO:tasks.workunit.client.0.vpm140.stderr: ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
2014-08-06T18:39:17.888 INFO:tasks.workunit.client.0.vpm140.stderr: 1: rbd() [0x41ef9a]
2014-08-06T18:39:17.888 INFO:tasks.workunit.client.0.vpm140.stderr: 2: (()+0xfcb0) [0x7fbf37f4dcb0]
2014-08-06T18:39:17.889 INFO:tasks.workunit.client.0.vpm140.stderr: 3: (PerfCounters::tinc(int, utime_t)+0x28) [0x7fbf38344028]
2014-08-06T18:39:17.889 INFO:tasks.workunit.client.0.vpm140.stderr: 4: (librbd::AioCompletion::complete()+0x210) [0x7fbf38fd2610]
2014-08-06T18:39:17.890 INFO:tasks.workunit.client.0.vpm140.stderr: 5: (librbd::AioCompletion::complete_request(CephContext*, long)+0x1c7) [0x7fbf38fd1a67]
2014-08-06T18:39:17.890 INFO:tasks.workunit.client.0.vpm140.stderr: 6: (librbd::C_AioRead::finish(int)+0x104) [0x7fbf38fd1df4]
2014-08-06T18:39:17.890 INFO:tasks.workunit.client.0.vpm140.stderr: 7: (Context::complete(int)+0xa) [0x7fbf38fd228a]
2014-08-06T18:39:17.890 INFO:tasks.workunit.client.0.vpm140.stderr: 8: (librbd::rados_req_cb(void*, void*)+0x47) [0x7fbf38fe5f77]
2014-08-06T18:39:17.890 INFO:tasks.workunit.client.0.vpm140.stderr: 9: (librados::C_AioComplete::finish(int)+0x1d) [0x7fbf382e3a5d]
2014-08-06T18:39:17.891 INFO:tasks.workunit.client.0.vpm140.stderr: 10: (Context::complete(int)+0xa) [0x7fbf382c450a]
2014-08-06T18:39:17.891 INFO:tasks.workunit.client.0.vpm140.stderr: 11: (Finisher::finisher_thread_entry()+0x1c0) [0x7fbf38367d20]
2014-08-06T18:39:17.891 INFO:tasks.workunit.client.0.vpm140.stderr: 12: (()+0x7e9a) [0x7fbf37f45e9a]
2014-08-06T18:39:17.891 INFO:tasks.workunit.client.0.vpm140.stderr: 13: (clone()+0x6d) [0x7fbf3755873d]
2014-08-06T18:39:17.892 INFO:tasks.workunit.client.0.vpm140.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-08-06T18:39:17.892 INFO:tasks.workunit.client.0.vpm140.stderr:
archive_path: /var/lib/teuthworker/archive/teuthology-2014-08-06_16:30:35-upgrade:dumpling-dumpling---basic-vps/404328
branch: dumpling
description: upgrade:dumpling/rbd/{0-cluster/start.yaml 1-dumpling-install/v0.67.3.yaml
  2-workload/rbd.yaml 3-upgrade-sequence/upgrade-osd-mon-mds.yaml 4-final/monthrash.yaml}
email: ceph-qa@ceph.com
job_id: '404328'
last_in_suite: false
machine_type: vps
name: teuthology-2014-08-06_16:30:35-upgrade:dumpling-dumpling---basic-vps
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: dumpling
  ceph:
    conf:
      global:
        osd heartbeat grace: 40
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    fs: xfs
    log-whitelist:
    - slow request
    - scrub
    sha1: 64d5c406995bedbb6a4bc9c851f5d25fe94749ee
  ceph-deploy:
    branch:
      dev: dumpling
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 64d5c406995bedbb6a4bc9c851f5d25fe94749ee
  s3tests:
    branch: dumpling
  workunit:
    sha1: 64d5c406995bedbb6a4bc9c851f5d25fe94749ee
owner: scheduled_yuriw
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mon.c
  - osd.3
  - osd.4
  - osd.5
  - client.0
suite: upgrade:dumpling
suite_branch: tasks_dumpling
targets:
  ubuntu@vpm075.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDb8Oli2PBVM6e+b6mOMvCIFYz3YUc8bypiC9rZtoUwi7nb69Vav7OEceiPG2166Vck9mMlOSGYmE82b9NsrlfhM2GSLdho2Cllql2GV1XsSmBPZTao4ir7AXnIU8HDwzTgOTEEZDjwzG7PBMV7Cnq8239edCZDVsqfj+OQhMTy28x1ddK9qGc8+y7x4Op7O4Uue2XKXhkGl4vlxcuYlSpRVCg59wBDBjOP/P0/DwKh+HPeI2ZDfMr+6GNIrOWebgR8tS4JisOyTbUmvIQlskt7C86tXyaxgAfza0XwbnsrRPr2bAS9S9kc+nPAnF76OjOi7OEeoACKOJ8oEEyzCRNz
  ubuntu@vpm140.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCso2lViPE+QM90JTNuwGQScjJtJE3jzd6ctl0U7mk/ftjoc7I1/RfUyUjG69xo0ct657hFPTtwJuqYLNcGrd5/p7Bt2HsfQj4wdKmoSBQnP3wPhn3N+WyrjhjtIjKy/GEFxBqZueqNzD1eAizxg/vxBz1G8m4lCV1W6hZP9VffyQNZmx/TEez1x0xg7q/qrUwLCTG2+HjDW9kbGg00SEKkC9XrrvV2oKrm1M6xy0ABP0HmC7NJPXZvIFfZENPJx4JuJRQbTNH6t6RRTvBhou8TZstxy5Lr3cgUZdgjZ6ucoZx84GMEeeAq9SuLYpHmYqJeS5ZRKu4S4Xc6nTDyq9tv
tasks:
- internal.lock_machines:
  - 2
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    tag: v0.67.3
- ceph: null
- parallel:
  - workload
  - upgrade-sequence
- mon_thrash:
    revive_delay: 20
    thrash_delay: 1
- workunit:
    clients:
      client.0:
      - rbd/copy.sh
    env:
      RBD_CREATE_ARGS: --new-format
teuthology_branch: notasks_master
tube: vps
upgrade-sequence:
  sequential:
  - install.upgrade:
      all:
        branch: dumpling
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.4
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.5
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.8665
workload:
  sequential:
  - workunit:
      clients:
        client.0:
        - rbd/import_export.sh
      env:
        RBD_CREATE_ARGS: --new-format
  - workunit:
      clients:
        client.0:
        - cls/test_cls_rbd.sh
description: upgrade:dumpling/rbd/{0-cluster/start.yaml 1-dumpling-install/v0.67.3.yaml
  2-workload/rbd.yaml 3-upgrade-sequence/upgrade-osd-mon-mds.yaml 4-final/monthrash.yaml}
duration: 397.9088749885559
failure_reason: 'Command failed on vpm140 with status 139: ''mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp
  && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1
  CEPH_REF=64d5c406995bedbb6a4bc9c851f5d25fe94749ee TESTDIR="/home/ubuntu/cephtest" 
  CEPH_ID="0" PYTHONPATH="$PYTHONPATH:/home/ubuntu/cephtest/binary/usr/local/lib/python2.7/dist-packages:/home/ubuntu/cephtest/binary/usr/local/lib/python2.6/dist-packages" 
  RBD_CREATE_ARGS=--new-format adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage
  /home/ubuntu/cephtest/workunit.client.0/rbd/import_export.sh'''
flavor: basic
owner: scheduled_yuriw
success: false

Related issues 4 (0 open4 closed)

Related to Ceph - Bug #9069: rgw tests reported as failed in teuthology-2014-08-11_10:35:04-upgrade:dumpling:rgw-dumpling---basic-vps suiteResolvedSage Weil08/11/2014

Actions
Related to Ceph - Bug #9288: "Assertion `nlock == 0' failed" in upgrade:firefly-firefly-testing-basic-vps suiteResolvedYuri Weinstein10/20/2014

Actions
Has duplicate Ceph - Bug #9683: "Segmentation fault" in upgrade:firefly-firefly-distro-basic-multi run Duplicate10/07/2014

Actions
Has duplicate Ceph - Bug #9551: "Segmentation fault" in upgrade:firefly-firefly-testing-basic-vps run DuplicateJoao Eduardo Luis09/20/2014

Actions
Actions #1

Updated by Sage Weil over 9 years ago

  • Subject changed from "Segmentation fault" in upgrade:dumpling-dumpling---basic-vps to dumpling clients SEGV during package upgrade
Actions #2

Updated by Yuri Weinstein over 9 years ago

https://github.com/ceph/ceph-qa-suite/pull/77 seemed fixing this.
Testing now.

Actions #3

Updated by Ian Colle over 9 years ago

  • Status changed from New to 7
Actions #4

Updated by Yuri Weinstein over 9 years ago

I see no segmentation errors in the latest run: /a/teuthology-2014-08-11_12:05:02-upgrade:dumpling-dumpling---basic-vps

Tests seemed passed, but failed later in the process.
http://pulpito.front.sepia.ceph.com/teuthology-2014-08-11_12:05:02-upgrade:dumpling-dumpling---basic-vps/

Actions #5

Updated by Sage Weil over 9 years ago

  • Status changed from 7 to 12
Actions #6

Updated by Sage Weil over 9 years ago

  • Subject changed from dumpling clients SEGV during package upgrade to clients can SEGV during package upgrade
Actions #7

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions #8

Updated by Sage Weil almost 3 years ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF