Project

General

Profile

Actions

Bug #9856

closed

osd crashed after upgrade from v0.80.5 to firefly

Added by Tamilarasi muthamizhan over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

osd crashed after upgrading from ceph v0.80.5 to firefly and during thrashing,

logs: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561996

2014-10-21T02:32:29.993 INFO:tasks.thrashosds.thrasher:in_osds:  [1, 2, 4, 5]  out_osds:  [3, 0] dead_osds:  [0] live_osds:  [1, 3, 2, 5, 4]
2014-10-21T02:32:29.993 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0
2014-10-21T02:32:29.994 INFO:tasks.thrashosds.thrasher:Reviving osd 0
2014-10-21T02:32:29.994 INFO:tasks.ceph.osd.0:Restarting daemon
2014-10-21T02:32:29.994 INFO:teuthology.orchestra.run.vpm052:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 0'
2014-10-21T02:32:29.996 INFO:tasks.ceph.osd.0:Started
2014-10-21T02:32:29.996 INFO:teuthology.orchestra.run.vpm052:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight'
2014-10-21T02:32:32.642 INFO:teuthology.orchestra.run.vpm052.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
2014-10-21T02:32:32.658 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 0, ['dump_ops_in_flight']
2014-10-21T02:32:36.985 INFO:tasks.ceph.osd.0.vpm052.stdout:starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2014-10-21T02:32:37.659 INFO:teuthology.orchestra.run.vpm052:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight'
2014-10-21T02:32:37.748 INFO:teuthology.orchestra.run.vpm052.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
2014-10-21T02:32:37.755 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 0, ['dump_ops_in_flight']
2014-10-21T02:32:42.315 INFO:tasks.ceph.osd.0.vpm052.stderr:2014-10-21 09:32:42.315610 7fe5d9f8e800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-10-21T02:32:42.756 INFO:teuthology.orchestra.run.vpm052:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight'
2014-10-21T02:32:44.915 INFO:tasks.ceph.osd.0.vpm052.stderr:common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fe5d3404700 time 2014-10-21 09:32:44.869899
2014-10-21T02:32:44.915 INFO:tasks.ceph.osd.0.vpm052.stderr:common/Mutex.cc: 93: FAILED assert(r == 0)
2014-10-21T02:32:46.726 INFO:tasks.ceph.osd.0.vpm052.stderr: ceph version 0.80.7-73-g5a10b95 (5a10b95f7968ecac1f2af4abf9fb91347a290544)
2014-10-21T02:32:46.726 INFO:tasks.ceph.osd.0.vpm052.stderr: 1: (Mutex::Lock(bool)+0x111) [0xa1abd1]
2014-10-21T02:32:46.726 INFO:tasks.ceph.osd.0.vpm052.stderr: 2: (FileJournal::write_finish_thread_entry()+0x9c) [0x95c94c]
2014-10-21T02:32:46.726 INFO:tasks.ceph.osd.0.vpm052.stderr: 3: (FileJournal::WriteFinisher::entry()+0xd) [0x88f6dd]
2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: 4: (()+0x8182) [0x7fe5d94b6182]
2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: 5: (clone()+0x6d) [0x7fe5d7c28fbd]
2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr:2014-10-21 09:32:46.723831 7fe5d3404700 -1 common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fe5d3404700 time 2014-10-21 09:32:44.869899
2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr:common/Mutex.cc: 93: FAILED assert(r == 0)
2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr:
2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: ceph version 0.80.7-73-g5a10b95 (5a10b95f7968ecac1f2af4abf9fb91347a290544)
2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: 1: (Mutex::Lock(bool)+0x111) [0xa1abd1]
2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 2: (FileJournal::write_finish_thread_entry()+0x9c) [0x95c94c]
2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 3: (FileJournal::WriteFinisher::entry()+0xd) [0x88f6dd]
2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 4: (()+0x8182) [0x7fe5d94b6182]
2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 5: (clone()+0x6d) [0x7fe5d7c28fbd]
2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr:
2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: -2421> 2014-10-21 09:32:42.315610 7fe5d9f8e800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2014-10-21T02:32:46.758 INFO:tasks.ceph.osd.0.vpm052.stderr:     0> 2014-10-21 09:32:46.723831 7fe5d3404700 -1 common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fe5d3404700 time 2014-10-21 09:32:44.869899
2014-10-21T02:32:46.758 INFO:tasks.ceph.osd.0.vpm052.stderr:common/Mutex.cc: 93: FAILED assert(r == 0)
2014-10-21T02:32:46.758 INFO:tasks.ceph.osd.0.vpm052.stderr:
2014-10-21T02:32:46.758 INFO:tasks.ceph.osd.0.vpm052.stderr: ceph version 0.80.7-73-g5a10b95 (5a10b95f7968ecac1f2af4abf9fb91347a290544)
2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 1: (Mutex::Lock(bool)+0x111) [0xa1abd1]
2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 2: (FileJournal::write_finish_thread_entry()+0x9c) [0x95c94c]
2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 3: (FileJournal::WriteFinisher::entry()+0xd) [0x88f6dd]
2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 4: (()+0x8182) [0x7fe5d94b6182]
2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 5: (clone()+0x6d) [0x7fe5d7c28fbd]
2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr:
2014-10-21T02:32:47.298 INFO:tasks.ceph.osd.0.vpm052.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
2014-10-21T02:32:48.200 INFO:tasks.ceph.osd.0.vpm052.stderr:*** Caught signal (Aborted) **

config file:

ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561996$ cat orig.config.yaml 
archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561996
branch: firefly
description: upgrade:firefly:newer/{0-cluster/start.yaml 1-install/v0.80.5.yaml 2-workload/{blogbench.yaml
  rbd.yaml s3tests.yaml testrados.yaml} 3-upgrade-sequence/upgrade-osd-mon-mds.yaml
  4-final/{monthrash.yaml osdthrash.yaml rbd.yaml testrgw.yaml} distros/ubuntu_14.04.yaml}
email: ceph-qa@ceph.com
job_id: '561996'
kernel:
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    fs: xfs
    log-whitelist:
    - slow request
    - scrub
    - scrub mismatch
    - ScrubResult
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 5a10b95f7968ecac1f2af4abf9fb91347a290544
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 5a10b95f7968ecac1f2af4abf9fb91347a290544
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: firefly
    idle_timeout: 1200
  workunit:
    sha1: 5a10b95f7968ecac1f2af4abf9fb91347a290544
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mon.c
  - osd.3
  - osd.4
  - osd.5
- - client.0
  - client.1
suite: upgrade:firefly:newer
suite_branch: firefly
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_firefly
tasks:
- chef: null
- clock.check: null
- install:
    tag: v0.80.5
- ceph: null
- parallel:
  - workload
  - upgrade-sequence
- sequential:
  - mon_thrash:
      revive_delay: 20
      thrash_delay: 1
  - ceph-fuse: null
  - workunit:
      clients:
        client.0:
        - suites/dbench.sh
- sequential:
  - thrashosds:
      chance_pgnum_grow: 1
      chance_pgpnum_fix: 1
      timeout: 1200
  - ceph-fuse:
    - client.0
  - workunit:
      clients:
        client.0:
        - suites/iogen.sh
- sequential:
  - workunit:
      clients:
        client.0:
        - rbd/import_export.sh
      env:
        RBD_CREATE_ARGS: --new-format
  - workunit:
      clients:
        client.0:
        - cls/test_cls_rbd.sh
- sequential:
  - rgw:
    - client.1
  - s3readwrite:
      client.0:
        readwrite:
          bucket: rwtest
          duration: 300
          files:
            num: 10
            size: 2000
            stddev: 500
          readers: 10
          writers: 3
        rgw_server: client.1
teuthology_branch: master
tube: vps
upgrade-sequence:
  sequential:
  - install.upgrade:
      all:
        branch: firefly
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.4
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.5
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.3060
workload:
  sequential:
  - workunit:
      clients:
        client.0:
        - suites/blogbench.sh
  - workunit:
      clients:
        client.0:
        - rbd/import_export.sh
      env:
        RBD_CREATE_ARGS: --new-format
  - workunit:
      clients:
        client.0:
        - cls/test_cls_rbd.sh
  - rgw:
    - client.0
  - s3tests:
      client.0:
        force-branch: firefly-original
        rgw_server: client.0
  - rados:
      clients:
      - client.0
      objects: 50
      op_weights:
        delete: 50
        read: 100
        rollback: 50
        snap_create: 50
        snap_remove: 50
        write: 100
      ops: 2000


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #9851: crash on journal/filestore shutdown on fireflyResolvedLoïc Dachary10/21/2014

Actions
Actions #1

Updated by Tamilarasi muthamizhan over 9 years ago

more jobs: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561999
ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561997

Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF