Project

General

Profile

Bug #8180

osd.3 crashed in upgrade:dumpling-x:stress-split-firefly-distro-basic-vps

Added by Yuri Weinstein almost 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-04-21_20:35:06-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/207826/

coredump info in @.../log/ceph-osd.3.log.gz

613088529-     0> 2014-04-22 12:30:57.556136 7f6db0611700 -1 *** Caught signal (Aborted) **
613088611- in thread 7f6db0611700
613088635-
613088636- ceph version 0.79-284-g025ab9f (025ab9f47b38959b0af3f9c060a152a215e41a15)
613088711- 1: ceph-osd() [0xaa4db2]
613088737- 2: (()+0xf030) [0x7f6dc571d030]
613088770- 3: (gsignal()+0x35) [0x7f6dc3ec3475]
613088808- 4: (abort()+0x180) [0x7f6dc3ec66f0]
613088845- 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f6dc471889d]
613088915- 6: (()+0x63996) [0x7f6dc4716996]
613088949- 7: (()+0x639c3) [0x7f6dc47169c3]
613088983- 8: (()+0x63bee) [0x7f6dc4716bee]
613089017- 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0xb7dada]
613089109- 10: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, ObjectStore::Transaction&)+0x44f) [0x8753af]
613089243- 11: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, eversion_t, ObjectStore::Transaction&, bool)+0x5f0) [0x8863b0]
613089390- 12: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0x6e5) [0x91a285]
613089480- 13: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x2de) [0xa2abee]
613089571- 14: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1a5) [0x8e8005]
613089676- 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x336) [0x747716]
613089798- 16: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x1ea) [0x7618ca]
613089891- 17: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a13ee]
613090082- 18: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0xb6f89a]
613090150- 19: (ThreadPool::WorkThread::entry()+0x10) [0xb70af0]
613090205- 20: (()+0x6b50) [0x7f6dc5714b50]
613090239- 21: (clone()+0x6d) [0x7f6dc3f6ba7d]
613090276- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-04-22T06:27:46.305 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/run_tasks.py", line 92, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/thrashosds.py", line 172, in task
    thrash_proc.do_join()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph_manager.py", line 153, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.3 restart
2014-04-22T06:27:46.380 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2014-04-22T06:27:46.380 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade
2014-04-22T06:27:46.380 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2014-04-22T06:27:46.380 DEBUG:teuthology.orchestra.run:Running [10.214.138.164]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json'
2014-04-22T06:27:49.556 INFO:teuthology.orchestra.run.err:[10.214.138.164]: dumped all in format json
2014-04-22T06:27:50.814 INFO:teuthology.task.ceph:Scrubbing osd osd.3
2014-04-22T06:27:50.814 DEBUG:teuthology.orchestra.run:Running [10.214.138.164]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.3'
2014-04-22T06:27:51.588 INFO:teuthology.orchestra.run.err:[10.214.138.164]: Error EAGAIN: osd.3 is not up
2014-04-22T06:27:51.597 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/contextutil.py", line 29, in nested
    yield vars
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1458, in task
    osd_scrub_pgs(ctx, config)
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1090, in osd_scrub_pgs
    'ceph', 'osd', 'scrub', role])
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/remote.py", line 106, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 330, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 326, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.138.164 with status 11: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.3'
2014-04-22T06:27:51.598 INFO:teuthology.misc:Shutting down mds daemons...
2014-04-22T06:27:51.598 DEBUG:teuthology.task.ceph.mds.a:waiting for process to exit
2014-04-22T06:27:51.869 INFO:teuthology.task.ceph.mds.a:Stopped
2014-04-22T06:27:51.870 INFO:teuthology.misc:Shutting down osd daemons...
2014-04-22T06:27:51.870 DEBUG:teuthology.task.ceph.osd.1:waiting for process to exit
2014-04-22T06:27:51.893 INFO:teuthology.task.ceph.osd.1:Stopped
2014-04-22T06:27:51.894 DEBUG:teuthology.task.ceph.osd.0:waiting for process to exit
2014-04-22T06:27:51.958 INFO:teuthology.task.ceph.osd.0:Stopped
2014-04-22T06:27:51.959 DEBUG:teuthology.task.ceph.osd.3:waiting for process to exit
2014-04-22T06:27:51.959 ERROR:teuthology.misc:Saw exception from osd.3
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/misc.py", line 1128, in stop_daemons_of_type
    daemon.stop()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 57, in stop
    run.wait([self.proc])
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 356, in wait
    proc.exitstatus.get()
  File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 207, in get
    raise self._exception
CommandFailedError: Command failed on 10.214.138.164 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 3'
2014-04-22T06:27:51.960 DEBUG:teuthology.task.ceph.osd.2:waiting for process to exit
archive_path: /var/lib/teuthworker/archive/teuthology-2014-04-21_20:35:06-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps/207826
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/debian_7.0.yaml}
email: null
job_id: '207826'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-04-21_20:35:06-upgrade:dumpling-x:stress-split-firefly-distro-basic-vps
nuke-on-error: true
os_type: debian
os_version: '7.0'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 025ab9f47b38959b0af3f9c060a152a215e41a15
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 025ab9f47b38959b0af3f9c060a152a215e41a15
  s3tests:
    branch: master
  workunit:
    sha1: 025ab9f47b38959b0af3f9c060a152a215e41a15
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
targets:
  ubuntu@vpm074.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCy7Hh+Ig9Z5aNAJl+e3y6u5C1XOuTbcpY5PjUNIYL38+Dc97ouvfFGqTi0XZR2iTBPkUtXninQK4KYKwpUoWp5lhtqp0pBBf7ayjtUX0bEM68QrAQzDp7drA77dGOwxOabCK0TfKwK3Hj3/B1pOnEdUYyf/FYnB9bUcJSINJ9U35p9BJH2a8f9t/iC8+lKL3xSWVAjQxCNeSpmi96vVsqMFc4zSFInsn69g0vsSQ32CpdxsuGhuZNb+OjRAFhTM28sTzIbrR7HMctNH6/R3F1X7UHxpQXB4MgI9uuKRxiQ+Dvfd0R/hqjgUhr/PeAMz39aPvw7A3BlBCiLR+r6lH0B
  ubuntu@vpm082.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuAV+pDt9p9FGrTMJsG6wqoJo1OL6rD7WTjhsF8B1WjPxZct3QoGdIhz53OYj1Pp8/RYZ9NftbhqqOoR8MVmOj2RRH56Z0EEaGVoSuszmFlgXjWOO30MHULNS2DMtqXpMpdtuOYPING9FkUXn3wx5/+dOoQdZe7q3HzF9twamw/JY/yWIRVtm3Zq5IOBRJ1YTiWt6PioiGBq8qYAm+13JfIyJXWa+Yd+Q8QXmZ7iTwxzX3jmU3gzbms+A4Sq1lztboVDVjjmhMKjdobHKUZG6U7l057YnlydPvmaxpQPGv7/diPWga5WV+w3w79YavXqVXTJOBmhF1xeG0kM+CoSa9
  ubuntu@vpm084.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSYGuGgNJJeijw5wjSCjiiOu5RIrNMXmf4adjqsuvelVJui0G2uUWUJqbOAxHaciZ+gzt7QqTLoOFIeDoZbqN/GwCD6BSCMzz+BdChFE9KgMmYqcD6lj3dD9C/yzZcbKrfsPdXxL2a0NE28t95cAb1pdAzjCYUaPiL+zJHPS87boAoP8Ofzcl3d9W9lRRys+8VXugSUgAUPQyijtAFIQvSXQt1+u/56Vd3HYZvmCViPamkLeVHNDKlKbSeoISdwBBev7wuDhXJj3WtoDNfpzKcW1i3E7jsRLBAB/CUnjsEnpIL1X+4gNi2EJ43peAdDrR3oK77slWYqsvUWVbzLJIN
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0:
      idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.30601
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/debian_7.0.yaml}
duration: 7904.588927984238
failure_reason: timed out waiting for admin_socket to appear after osd.3 restart
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues

Related to Ceph - Bug #8162: osd: dumpling advances last_backfill prematurely Resolved 04/19/2014

Also available in: Atom PDF