Actions
Bug #9856
closedosd crashed after upgrade from v0.80.5 to firefly
Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
osd crashed after upgrading from ceph v0.80.5 to firefly and during thrashing,
logs: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561996
2014-10-21T02:32:29.993 INFO:tasks.thrashosds.thrasher:in_osds: [1, 2, 4, 5] out_osds: [3, 0] dead_osds: [0] live_osds: [1, 3, 2, 5, 4] 2014-10-21T02:32:29.993 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0 2014-10-21T02:32:29.994 INFO:tasks.thrashosds.thrasher:Reviving osd 0 2014-10-21T02:32:29.994 INFO:tasks.ceph.osd.0:Restarting daemon 2014-10-21T02:32:29.994 INFO:teuthology.orchestra.run.vpm052:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 0' 2014-10-21T02:32:29.996 INFO:tasks.ceph.osd.0:Started 2014-10-21T02:32:29.996 INFO:teuthology.orchestra.run.vpm052:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight' 2014-10-21T02:32:32.642 INFO:teuthology.orchestra.run.vpm052.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused 2014-10-21T02:32:32.658 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 0, ['dump_ops_in_flight'] 2014-10-21T02:32:36.985 INFO:tasks.ceph.osd.0.vpm052.stdout:starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2014-10-21T02:32:37.659 INFO:teuthology.orchestra.run.vpm052:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight' 2014-10-21T02:32:37.748 INFO:teuthology.orchestra.run.vpm052.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused 2014-10-21T02:32:37.755 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 0, ['dump_ops_in_flight'] 2014-10-21T02:32:42.315 INFO:tasks.ceph.osd.0.vpm052.stderr:2014-10-21 09:32:42.315610 7fe5d9f8e800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2014-10-21T02:32:42.756 INFO:teuthology.orchestra.run.vpm052:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight' 2014-10-21T02:32:44.915 INFO:tasks.ceph.osd.0.vpm052.stderr:common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fe5d3404700 time 2014-10-21 09:32:44.869899 2014-10-21T02:32:44.915 INFO:tasks.ceph.osd.0.vpm052.stderr:common/Mutex.cc: 93: FAILED assert(r == 0) 2014-10-21T02:32:46.726 INFO:tasks.ceph.osd.0.vpm052.stderr: ceph version 0.80.7-73-g5a10b95 (5a10b95f7968ecac1f2af4abf9fb91347a290544) 2014-10-21T02:32:46.726 INFO:tasks.ceph.osd.0.vpm052.stderr: 1: (Mutex::Lock(bool)+0x111) [0xa1abd1] 2014-10-21T02:32:46.726 INFO:tasks.ceph.osd.0.vpm052.stderr: 2: (FileJournal::write_finish_thread_entry()+0x9c) [0x95c94c] 2014-10-21T02:32:46.726 INFO:tasks.ceph.osd.0.vpm052.stderr: 3: (FileJournal::WriteFinisher::entry()+0xd) [0x88f6dd] 2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: 4: (()+0x8182) [0x7fe5d94b6182] 2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: 5: (clone()+0x6d) [0x7fe5d7c28fbd] 2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr:2014-10-21 09:32:46.723831 7fe5d3404700 -1 common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fe5d3404700 time 2014-10-21 09:32:44.869899 2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr:common/Mutex.cc: 93: FAILED assert(r == 0) 2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: 2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: ceph version 0.80.7-73-g5a10b95 (5a10b95f7968ecac1f2af4abf9fb91347a290544) 2014-10-21T02:32:46.727 INFO:tasks.ceph.osd.0.vpm052.stderr: 1: (Mutex::Lock(bool)+0x111) [0xa1abd1] 2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 2: (FileJournal::write_finish_thread_entry()+0x9c) [0x95c94c] 2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 3: (FileJournal::WriteFinisher::entry()+0xd) [0x88f6dd] 2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 4: (()+0x8182) [0x7fe5d94b6182] 2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 5: (clone()+0x6d) [0x7fe5d7c28fbd] 2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: 2014-10-21T02:32:46.728 INFO:tasks.ceph.osd.0.vpm052.stderr: -2421> 2014-10-21 09:32:42.315610 7fe5d9f8e800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2014-10-21T02:32:46.758 INFO:tasks.ceph.osd.0.vpm052.stderr: 0> 2014-10-21 09:32:46.723831 7fe5d3404700 -1 common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fe5d3404700 time 2014-10-21 09:32:44.869899 2014-10-21T02:32:46.758 INFO:tasks.ceph.osd.0.vpm052.stderr:common/Mutex.cc: 93: FAILED assert(r == 0) 2014-10-21T02:32:46.758 INFO:tasks.ceph.osd.0.vpm052.stderr: 2014-10-21T02:32:46.758 INFO:tasks.ceph.osd.0.vpm052.stderr: ceph version 0.80.7-73-g5a10b95 (5a10b95f7968ecac1f2af4abf9fb91347a290544) 2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 1: (Mutex::Lock(bool)+0x111) [0xa1abd1] 2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 2: (FileJournal::write_finish_thread_entry()+0x9c) [0x95c94c] 2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 3: (FileJournal::WriteFinisher::entry()+0xd) [0x88f6dd] 2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 4: (()+0x8182) [0x7fe5d94b6182] 2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 5: (clone()+0x6d) [0x7fe5d7c28fbd] 2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2014-10-21T02:32:46.759 INFO:tasks.ceph.osd.0.vpm052.stderr: 2014-10-21T02:32:47.298 INFO:tasks.ceph.osd.0.vpm052.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion' 2014-10-21T02:32:48.200 INFO:tasks.ceph.osd.0.vpm052.stderr:*** Caught signal (Aborted) ** config file: ubuntu@teuthology:/a/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561996$ cat orig.config.yaml archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps/561996 branch: firefly description: upgrade:firefly:newer/{0-cluster/start.yaml 1-install/v0.80.5.yaml 2-workload/{blogbench.yaml rbd.yaml s3tests.yaml testrados.yaml} 3-upgrade-sequence/upgrade-osd-mon-mds.yaml 4-final/{monthrash.yaml osdthrash.yaml rbd.yaml testrgw.yaml} distros/ubuntu_14.04.yaml} email: ceph-qa@ceph.com job_id: '561996' kernel: kdb: true sha1: distro last_in_suite: false machine_type: vps name: teuthology-2014-10-20_19:10:01-upgrade:firefly:newer-firefly-distro-basic-vps nuke-on-error: true os_type: ubuntu os_version: '14.04' overrides: admin_socket: branch: firefly ceph: conf: global: osd heartbeat grace: 100 mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 fs: xfs log-whitelist: - slow request - scrub - scrub mismatch - ScrubResult - wrongly marked me down - objects unfound and apparently lost - log bound mismatch sha1: 5a10b95f7968ecac1f2af4abf9fb91347a290544 ceph-deploy: branch: dev: firefly conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 5a10b95f7968ecac1f2af4abf9fb91347a290544 rgw: default_idle_timeout: 1200 s3tests: branch: firefly idle_timeout: 1200 workunit: sha1: 5a10b95f7968ecac1f2af4abf9fb91347a290544 owner: scheduled_teuthology@teuthology priority: 1000 roles: - - mon.a - mds.a - osd.0 - osd.1 - osd.2 - - mon.b - mon.c - osd.3 - osd.4 - osd.5 - - client.0 - client.1 suite: upgrade:firefly:newer suite_branch: firefly suite_path: /var/lib/teuthworker/src/ceph-qa-suite_firefly tasks: - chef: null - clock.check: null - install: tag: v0.80.5 - ceph: null - parallel: - workload - upgrade-sequence - sequential: - mon_thrash: revive_delay: 20 thrash_delay: 1 - ceph-fuse: null - workunit: clients: client.0: - suites/dbench.sh - sequential: - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - ceph-fuse: - client.0 - workunit: clients: client.0: - suites/iogen.sh - sequential: - workunit: clients: client.0: - rbd/import_export.sh env: RBD_CREATE_ARGS: --new-format - workunit: clients: client.0: - cls/test_cls_rbd.sh - sequential: - rgw: - client.1 - s3readwrite: client.0: readwrite: bucket: rwtest duration: 300 files: num: 10 size: 2000 stddev: 500 readers: 10 writers: 3 rgw_server: client.1 teuthology_branch: master tube: vps upgrade-sequence: sequential: - install.upgrade: all: branch: firefly - ceph.restart: - osd.0 - sleep: duration: 30 - ceph.restart: - osd.1 - sleep: duration: 30 - ceph.restart: - osd.2 - sleep: duration: 30 - ceph.restart: - osd.3 - sleep: duration: 30 - ceph.restart: - osd.4 - sleep: duration: 30 - ceph.restart: - osd.5 - sleep: duration: 60 - ceph.restart: - mon.a - sleep: duration: 60 - ceph.restart: - mon.b - sleep: duration: 60 - ceph.restart: - mon.c - sleep: duration: 60 - ceph.restart: - mds.a verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.3060 workload: sequential: - workunit: clients: client.0: - suites/blogbench.sh - workunit: clients: client.0: - rbd/import_export.sh env: RBD_CREATE_ARGS: --new-format - workunit: clients: client.0: - cls/test_cls_rbd.sh - rgw: - client.0 - s3tests: client.0: force-branch: firefly-original rgw_server: client.0 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 2000
Actions