Project

General

Profile

Actions

Bug #9273

closed

mon doesn't preload ec plugins; triggers crash in upgrade tests

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
-
% Done:

80%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In run : http://pulpito.front.sepia.ceph.com/teuthology-2014-08-28_11:18:18-upgrade:dumpling-firefly-x-master-distro-basic-vps/

7 jobs seem to fail in the same way:

Failure: Command failed on vpm103 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f -i a'
['458043', '458044', '458045', '458046', '458048', '458049', '458050']

Logs are in: http://qa-proxy.ceph.com/teuthology/teuthology-2014-08-28_11:18:18-upgrade:dumpling-firefly-x-master-distro-basic-vps/458044/

014-08-28T12:34:18.496 INFO:tasks.workunit.client.0.vpm009.stdout:[ RUN      ] LibRadosAioEC.SimpleWrite
2014-08-28T12:34:20.618 INFO:tasks.workunit.client.0.vpm009.stdout:test/librados/aio.cc:1634: Failure
2014-08-28T12:34:20.618 INFO:tasks.workunit.client.0.vpm009.stdout:Value of: test_data.init()
2014-08-28T12:34:20.619 INFO:tasks.workunit.client.0.vpm009.stdout:  Actual: "create_one_ec_pool(test-rados-api-vpm009-10092-33) failed: error rados_mon_command erasure-code-profile set name:testprofile failed with error -5" 
2014-08-28T12:34:20.619 INFO:tasks.workunit.client.0.vpm009.stdout:Expected: "" 
2014-08-28T12:34:20.620 INFO:tasks.workunit.client.0.vpm009.stdout:[  FAILED  ] LibRadosAioEC.SimpleWrite (2125 ms)
2014-08-28T12:34:20.620 INFO:tasks.workunit.client.0.vpm009.stdout:[ RUN      ] LibRadosAioEC.SimpleWritePP
2014-08-28T12:34:22.157 INFO:tasks.ceph.mon.a.vpm006.stderr:*** Caught signal (Segmentation fault) **
2014-08-28T12:34:22.157 INFO:tasks.ceph.mon.a.vpm006.stderr: in thread 7f0824ba5700
2014-08-28T12:34:22.586 INFO:tasks.ceph.mon.a.vpm006.stderr:daemon-helper: command crashed with signal 11
2014-08-28T12:34:48.560 INFO:tasks.workunit.client.0.vpm009.stdout:[       OK ] LibRadosAioEC.SimpleWritePP (27940 ms)
2014-08-28T12:34:48.560 INFO:tasks.workunit.client.0.vpm009.stdout:[ RUN      ] LibRadosAioEC.WaitForSafe
2014-08-28T12:34:55.874 INFO:tasks.workunit.client.0.vpm009.stdout:[       OK ] LibRadosAioEC.WaitForSafe (7314 ms)
2014-08-28T12:34:55.874 INFO:tasks.workunit.client.0.vpm009.stdout:[ RUN      ] LibRadosAioEC.WaitForSafePP
archive_path: /var/lib/teuthworker/archive/teuthology-2014-08-28_11:18:18-upgrade:dumpling-firefly-x-master-distro-basic-vps/458044
branch: master
description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml
  test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml
  6-final-workload/{ec-readwrite.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/rhel_6.5.yaml}
email: ceph-qa@ceph.com
job_id: '458044'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-08-28_11:18:18-upgrade:dumpling-firefly-x-master-distro-basic-vps
nuke-on-error: true
os_type: rhel
os_version: '6.5'
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - scrub mismatch
    - ScrubResult
    sha1: 5b0af4c8aa797f77810729b59dd9d1a70e15ab26
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 5b0af4c8aa797f77810729b59dd9d1a70e15ab26
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: master
    idle_timeout: 1200
  workunit:
    sha1: 5b0af4c8aa797f77810729b59dd9d1a70e15ab26
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
suite: upgrade:dumpling-firefly-x
suite_branch: master
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_master
targets:
  ubuntu@vpm006.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA0S8URwVCM5cMCgaNpUddX21Wva4H6e/SIUKDtoQAzMpi7ZK27mlKWd+RpljnUYVZfPjp8q4NfmKy2QXxgB6StXqkHFl9HjWvgf2j15WzbL1JC6zWw3cJleW7YCHsEo6OF7F7fM/SCXkiTWfMrAGXOpDAfxI76o21qQZENs6/G6qgNkzTGzzI5tmRK33IfDA5qPYCQzmsgW0EbJAG1XVMCLlvc/QJkJmJWGhLOfRdUqlSDGh+b68Gc+Xffzld3rW7CWBmqsPSt9XVv3JAcki+7SZtGo0q5s8fWqyDUxBmCO3PKAay3LY6GczePcPqOj8aiCS/a4F7w6YYzcAIbM2dVw==
  ubuntu@vpm009.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA0M9tcw3vpb4PgvHHajYt90z62W4miHi3ilWYhkIFf5rBfygi4ppaCUizWvyQ4ovzz4riigw04abZhy3ATjkCgUaKwQNyn0z9OEBlk3URs8ZrwHscGtMXCNLt/dg6KQXtKJf1F6OHI99hDqn3EwzGnFCcaMzRx2desTZOJkuVAf3+k1RqAgeaj77mG1vHJvkkAPjWS777AIyqh9ftWPZ+HNgb/g+V92w8CEPlsaJ6rl0bLDKJvK3rkxd/b0sNMeDNam20eGZhF8u923fBydE0AHgrqG4Lhi+np+mCeeRJWms4hCo0ZV1uRkpvjkYikIMIoXSW4OE6p8RI95BKCNE04w==
  ubuntu@vpm100.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqiF20qyBhqrSSTx+ouJl/RCnLOaowyVLD4x549J7oPBkQqOmj/K3Pi3wIai2EaFjpTA8EfOhDuze6i1RuDA2XAu4k2fKWyb0o7oXKIdxjsXfrT5NVOJq02C6yG57F4EKrR9o54BJyeudpFtgiG2SkSRM8BMm57jndXo3VAVpTagIGazmv1mAoe/6ezM+Pspo57tght6l48P3+fnb5Cz4j6YgaI9yRLj+8BtUQFL0MaL56CKqLROXlOgy6O03lCbYQqLe9EYm/deykOVDiZ70gwd+bqYHNL2sTJw+XTj7ykuHWGgtxgfJcX66K306LO25SvnZKpyDnkfwZbHhR8y9OQ==
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- print: '**** done dumpling install'
- ceph:
    fs: xfs
- parallel:
  - workload
- print: '**** done parallel'
- install.upgrade:
    client.0:
      branch: firefly
    mon.a:
      branch: firefly
    mon.b:
      branch: firefly
- print: '**** done install.upgrade'
- ceph.restart: null
- print: '**** done restart'
- parallel:
  - workload2
  - upgrade-sequence
- print: '**** done parallel'
- install.upgrade:
    client.0: null
- print: '**** done install.upgrade client.0 to the version from teuthology-suite
    arg'
- rados:
    clients:
    - client.0
    ec_pool: true
    objects: 500
    op_weights:
      append: 45
      delete: 10
      read: 45
      write: 0
    ops: 4000
- rados:
    clients:
    - client.1
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- workunit:
    clients:
      client.1:
      - rados/load-gen-mix.sh
- sequential:
  - mon_thrash:
      revive_delay: 20
      thrash_delay: 1
  - workunit:
      clients:
        client.1:
        - rados/test.sh
  - print: '**** done rados/test.sh - 6-final-workload'
- workunit:
    clients:
      client.1:
      - cls/test_cls_rbd.sh
- workunit:
    clients:
      client.1:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- rgw:
  - client.1
- s3tests:
    client.1:
      rgw_server: client.1
- swift:
    client.1:
      rgw_server: client.1
teuthology_branch: master
tube: vps
upgrade-sequence:
  sequential:
  - install.upgrade:
      mon.a: null
  - print: '**** done install.upgrade mon.a to the version from teuthology-suite arg'
  - install.upgrade:
      mon.b: null
  - print: '**** done install.upgrade mon.b to the version from teuthology-suite arg'
  - ceph.restart:
      daemons:
      - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
      daemons:
      - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - exec:
      mon.a:
      - ceph osd crush tunables firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.14191
workload:
  sequential:
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done rados/test.sh &  cls'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh'
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh'
workload2:
  sequential:
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/test.sh
        - cls
  - print: '**** done #rados/test.sh and cls 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rados/load-gen-big.sh
  - print: '**** done rados/load-gen-big.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd.sh
  - print: '**** done rbd/test_librbd.sh 2'
  - workunit:
      branch: firefly
      clients:
        client.0:
        - rbd/test_librbd_python.sh
  - print: '**** done rbd/test_librbd_python.sh 2'
description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml}
  3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml
  test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml
  6-final-workload/{ec-readwrite.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml
  rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml}
  distros/rhel_6.5.yaml}
duration: 3747.218440055847
failure_reason: 'Command failed on vpm006 with status 1: ''sudo adjust-ulimits ceph-coverage
  /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f -i a'''
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #9153: erasure-code: jerasure_matrix_dotprod segmentation fault due to package upgrade raceResolvedLoïc Dachary08/17/2014

Actions
Actions #1

Updated by Sage Weil over 9 years ago

  • Subject changed from "Segmentation fault"/"erasure-code-profile" in upgrade:dumpling-x-firefly---basic-vps suite to mon doesn't preload ec plugins; triggers crash in upgrade tests
  • Assignee set to Loïc Dachary
  • Priority changed from Normal to Urgent
Actions #2

Updated by Loïc Dachary over 9 years ago

  • Category set to Monitor
  • Status changed from New to Fix Under Review
Actions #3

Updated by Loïc Dachary over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
  • % Done changed from 0 to 80

firefly backport being tested

Actions #4

Updated by Loïc Dachary over 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF