Bug #8180
osd.3 crashed in upgrade:dumpling-x:stress-split-firefly-distro-basic-vps
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
coredump info in @ubuntu@vpm082.front.sepia.ceph.com.../log/ceph-osd.3.log.gz
613088529- 0> 2014-04-22 12:30:57.556136 7f6db0611700 -1 *** Caught signal (Aborted) ** 613088611- in thread 7f6db0611700 613088635- 613088636- ceph version 0.79-284-g025ab9f (025ab9f47b38959b0af3f9c060a152a215e41a15) 613088711- 1: ceph-osd() [0xaa4db2] 613088737- 2: (()+0xf030) [0x7f6dc571d030] 613088770- 3: (gsignal()+0x35) [0x7f6dc3ec3475] 613088808- 4: (abort()+0x180) [0x7f6dc3ec66f0] 613088845- 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f6dc471889d] 613088915- 6: (()+0x63996) [0x7f6dc4716996] 613088949- 7: (()+0x639c3) [0x7f6dc47169c3] 613088983- 8: (()+0x63bee) [0x7f6dc4716bee] 613089017- 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0xb7dada] 613089109- 10: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, ObjectStore::Transaction&)+0x44f) [0x8753af] 613089243- 11: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, eversion_t, ObjectStore::Transaction&, bool)+0x5f0) [0x8863b0] 613089390- 12: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0x6e5) [0x91a285] 613089480- 13: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x2de) [0xa2abee] 613089571- 14: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1a5) [0x8e8005] 613089676- 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x336) [0x747716] 613089798- 16: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x1ea) [0x7618ca] 613089891- 17: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a13ee] 613090082- 18: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb6f89a] 613090150- 19: (ThreadPool::WorkThread::entry()+0x10) [0xb70af0] 613090205- 20: (()+0x6b50) [0x7f6dc5714b50] 613090239- 21: (clone()+0x6d) [0x7f6dc3f6ba7d] 613090276- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-04-22T06:27:46.305 ERROR:teuthology.run_tasks:Manager failed: thrashosds Traceback (most recent call last): File "/home/teuthworker/teuthology-firefly/teuthology/run_tasks.py", line 92, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/home/teuthworker/teuthology-firefly/teuthology/task/thrashosds.py", line 172, in task thrash_proc.do_join() File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph_manager.py", line 153, in do_join self.thread.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception Exception: timed out waiting for admin_socket to appear after osd.3 restart 2014-04-22T06:27:46.380 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart 2014-04-22T06:27:46.380 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade 2014-04-22T06:27:46.380 DEBUG:teuthology.run_tasks:Unwinding manager ceph 2014-04-22T06:27:46.380 DEBUG:teuthology.orchestra.run:Running [10.214.138.164]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json' 2014-04-22T06:27:49.556 INFO:teuthology.orchestra.run.err:[10.214.138.164]: dumped all in format json 2014-04-22T06:27:50.814 INFO:teuthology.task.ceph:Scrubbing osd osd.3 2014-04-22T06:27:50.814 DEBUG:teuthology.orchestra.run:Running [10.214.138.164]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.3' 2014-04-22T06:27:51.588 INFO:teuthology.orchestra.run.err:[10.214.138.164]: Error EAGAIN: osd.3 is not up 2014-04-22T06:27:51.597 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/teuthology-firefly/teuthology/contextutil.py", line 29, in nested yield vars File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1458, in task osd_scrub_pgs(ctx, config) File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1090, in osd_scrub_pgs 'ceph', 'osd', 'scrub', role]) File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/remote.py", line 106, in run r = self._runner(client=self.ssh, **kwargs) File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 330, in run r.exitstatus = _check_status(r.exitstatus) File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 326, in _check_status raise CommandFailedError(command=r.command, exitstatus=status, node=host) CommandFailedError: Command failed on 10.214.138.164 with status 11: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.3' 2014-04-22T06:27:51.598 INFO:teuthology.misc:Shutting down mds daemons... 2014-04-22T06:27:51.598 DEBUG:teuthology.task.ceph.mds.a:waiting for process to exit 2014-04-22T06:27:51.869 INFO:teuthology.task.ceph.mds.a:Stopped 2014-04-22T06:27:51.870 INFO:teuthology.misc:Shutting down osd daemons... 2014-04-22T06:27:51.870 DEBUG:teuthology.task.ceph.osd.1:waiting for process to exit 2014-04-22T06:27:51.893 INFO:teuthology.task.ceph.osd.1:Stopped 2014-04-22T06:27:51.894 DEBUG:teuthology.task.ceph.osd.0:waiting for process to exit 2014-04-22T06:27:51.958 INFO:teuthology.task.ceph.osd.0:Stopped 2014-04-22T06:27:51.959 DEBUG:teuthology.task.ceph.osd.3:waiting for process to exit 2014-04-22T06:27:51.959 ERROR:teuthology.misc:Saw exception from osd.3 Traceback (most recent call last): File "/home/teuthworker/teuthology-firefly/teuthology/misc.py", line 1128, in stop_daemons_of_type daemon.stop() File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 57, in stop run.wait([self.proc]) File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 356, in wait proc.exitstatus.get() File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 207, in get raise self._exception CommandFailedError: Command failed on 10.214.138.164 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 3' 2014-04-22T06:27:51.960 DEBUG:teuthology.task.ceph.osd.2:waiting for process to exit
archive_path: /var/lib/teuthworker/archive/teuthology-2014-04-21_20:35:06-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/207826 description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml 6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/debian_7.0.yaml} email: null job_id: '207826' kernel: &id001 kdb: true sha1: distro last_in_suite: false machine_type: vps name: teuthology-2014-04-21_20:35:06-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps nuke-on-error: true os_type: debian os_version: '7.0' overrides: admin_socket: branch: firefly ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - wrongly marked me down - objects unfound and apparently lost - log bound mismatch sha1: 025ab9f47b38959b0af3f9c060a152a215e41a15 ceph-deploy: branch: dev: firefly conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: 025ab9f47b38959b0af3f9c060a152a215e41a15 s3tests: branch: master workunit: sha1: 025ab9f47b38959b0af3f9c060a152a215e41a15 owner: scheduled_teuthology@teuthology roles: - - mon.a - mon.b - mds.a - osd.0 - osd.1 - osd.2 - - osd.3 - osd.4 - osd.5 - mon.c - - client.0 targets: ubuntu@vpm074.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCy7Hh+Ig9Z5aNAJl+e3y6u5C1XOuTbcpY5PjUNIYL38+Dc97ouvfFGqTi0XZR2iTBPkUtXninQK4KYKwpUoWp5lhtqp0pBBf7ayjtUX0bEM68QrAQzDp7drA77dGOwxOabCK0TfKwK3Hj3/B1pOnEdUYyf/FYnB9bUcJSINJ9U35p9BJH2a8f9t/iC8+lKL3xSWVAjQxCNeSpmi96vVsqMFc4zSFInsn69g0vsSQ32CpdxsuGhuZNb+OjRAFhTM28sTzIbrR7HMctNH6/R3F1X7UHxpQXB4MgI9uuKRxiQ+Dvfd0R/hqjgUhr/PeAMz39aPvw7A3BlBCiLR+r6lH0B ubuntu@vpm082.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuAV+pDt9p9FGrTMJsG6wqoJo1OL6rD7WTjhsF8B1WjPxZct3QoGdIhz53OYj1Pp8/RYZ9NftbhqqOoR8MVmOj2RRH56Z0EEaGVoSuszmFlgXjWOO30MHULNS2DMtqXpMpdtuOYPING9FkUXn3wx5/+dOoQdZe7q3HzF9twamw/JY/yWIRVtm3Zq5IOBRJ1YTiWt6PioiGBq8qYAm+13JfIyJXWa+Yd+Q8QXmZ7iTwxzX3jmU3gzbms+A4Sq1lztboVDVjjmhMKjdobHKUZG6U7l057YnlydPvmaxpQPGv7/diPWga5WV+w3w79YavXqVXTJOBmhF1xeG0kM+CoSa9 ubuntu@vpm084.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSYGuGgNJJeijw5wjSCjiiOu5RIrNMXmf4adjqsuvelVJui0G2uUWUJqbOAxHaciZ+gzt7QqTLoOFIeDoZbqN/GwCD6BSCMzz+BdChFE9KgMmYqcD6lj3dD9C/yzZcbKrfsPdXxL2a0NE28t95cAb1pdAzjCYUaPiL+zJHPS87boAoP8Ofzcl3d9W9lRRys+8VXugSUgAUPQyijtAFIQvSXQt1+u/56Vd3HYZvmCViPamkLeVHNDKlKbSeoISdwBBev7wuDhXJj3WtoDNfpzKcW1i3E7jsRLBAB/CUnjsEnpIL1X+4gNi2EJ43peAdDrR3oK77slWYqsvUWVbzLJIN tasks: - internal.lock_machines: - 3 - vps - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: dumpling - ceph: fs: xfs - install.upgrade: osd.0: null - ceph.restart: daemons: - osd.0 - osd.1 - osd.2 - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 thrash_primary_affinity: false timeout: 1200 - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - radosbench: clients: - client.0 time: 1800 - install.upgrade: mon.c: null - ceph.restart: daemons: - mon.c wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: - a - b - c - workunit: branch: dumpling clients: client.0: - rados/test-upgrade-firefly.sh - workunit: branch: dumpling clients: client.0: - rbd/test_librbd_python.sh - rgw: client.0: idle_timeout: 300 - swift: client.0: rgw_server: client.0 - rados: clients: - client.0 objects: 500 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 teuthology_branch: firefly verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.30601
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml 6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/debian_7.0.yaml} duration: 7904.588927984238 failure_reason: timed out waiting for admin_socket to appear after osd.3 restart flavor: basic owner: scheduled_teuthology@teuthology success: false
Related issues
History
#1 Updated by Yuri Weinstein almost 10 years ago
- Severity changed from 3 - minor to 2 - major
#2 Updated by Yuri Weinstein almost 10 years ago
And one more similar crash logs are in - http://qa-proxy.ceph.com/teuthology/teuthology-2014-04-26_20:35:01-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/216783/
#3 Updated by Samuel Just almost 10 years ago
- Status changed from New to Duplicate