Bug #8367: osd crashed in upgrade:dumpling-x:stress-split-firefly---basic-plana - Ceph - Ceph

Actions

Copy link

Bug #8367

closed

osd crashed in upgrade:dumpling-x:stress-split-firefly---basic-plana

Added by Yuri Weinstein almost 10 years ago. Updated almost 10 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-05-14_19:55:03-upgrade:dumpling-x:stress-split-firefly---basic-plana/254850/

Exceptions from ubuntu@teuthology:/a/teuthology-2014-05-14_19:55:03-upgrade:dumpling-x:stress-split-firefly---basic-plana/254850/remote/ubuntu@plana44.front.sepia.ceph.com/log/ceph-osd.2.log.gz :

1783232041:     0> 2014-05-14 22:54:53.372658 7fde03d1a700 -1 *** Caught signal (Aborted) **
1783232123- in thread 7fde03d1a700
1783232147-
1783232148- ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1783232212- 1: ceph-osd() [0x98baba]
1783232238- 2: (()+0xfcb0) [0x7fde18dc9cb0]
1783232271- 3: (gsignal()+0x35) [0x7fde172c4425]
1783232309- 4: (abort()+0x17b) [0x7fde172c7b8b]
1783232346- 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fde17c1769d]
1783232416- 6: (()+0xb5846) [0x7fde17c15846]
1783232450- 7: (()+0xb5873) [0x7fde17c15873]
1783232484- 8: (()+0xb596e) [0x7fde17c1596e]
1783232518- 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0xa6d2ef]
1783232610- 10: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, ObjectStore::Transaction&)+0x495) [0x75da15]
1783232744- 11: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, eversion_t, ObjectStore::Transaction&, bool)+0x4ac) [0x75dfdc]
1783232891- 12: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0xaa0) [0x7dd470]
1783232981- 13: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x55c) [0x911d9c]
1783233072- 14: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1ee) [0x7beb6e]
1783233177- 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x34a) [0x61ac4a]
1783233299- 16: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x628) [0x6366a8]
1783233392- 17: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x67cfac]
1783233583- 18: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa5db06]
1783233651- 19: (ThreadPool::WorkThread::entry()+0x10) [0xa5f910]
1783233706- 20: (()+0x7e9a) [0x7fde18dc1e9a]
1783233740- 21: (clone()+0x6d) [0x7fde173823fd]
1783233777- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
1783233870-
1783233871---- logging levels ---
1783233894-   0/ 5 none

2014-05-14T23:12:41.108 INFO:teuthology.task.thrashosds:joining thrashosds
2014-05-14T23:12:41.108 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/run_tasks.py", line 92, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/thrashosds.py", line 178, in task
    thrash_proc.do_join()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph_manager.py", line 165, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.2 restart
2014-05-14T23:12:41.158 DEBUG:teuthology.run_tasks:Unwinding manager ceph.restart
2014-05-14T23:12:41.159 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade
2014-05-14T23:12:41.159 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2014-05-14T23:12:41.159 DEBUG:teuthology.orchestra.run:Running [10.214.132.34]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json'
2014-05-14T23:12:41.593 INFO:teuthology.orchestra.run.err:[10.214.132.34]: dumped all in format json
2014-05-14T23:12:42.638 INFO:teuthology.task.ceph:Scrubbing osd osd.0
2014-05-14T23:12:42.638 DEBUG:teuthology.orchestra.run:Running [10.214.132.34]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.0'
2014-05-14T23:12:43.038 INFO:teuthology.orchestra.run.err:[10.214.132.34]: osd.0 instructed to scrub
2014-05-14T23:12:43.050 INFO:teuthology.task.ceph:Scrubbing osd osd.1
2014-05-14T23:12:43.050 DEBUG:teuthology.orchestra.run:Running [10.214.132.34]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.1'
2014-05-14T23:12:43.281 INFO:teuthology.orchestra.run.err:[10.214.132.34]: osd.1 instructed to scrub
2014-05-14T23:12:43.292 INFO:teuthology.task.ceph:Scrubbing osd osd.2
2014-05-14T23:12:43.292 DEBUG:teuthology.orchestra.run:Running [10.214.132.34]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.2'
2014-05-14T23:12:43.523 INFO:teuthology.orchestra.run.err:[10.214.132.34]: Error EAGAIN: osd.2 is not up
2014-05-14T23:12:43.535 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/contextutil.py", line 29, in nested
    yield vars
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1458, in task
    osd_scrub_pgs(ctx, config)
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 1090, in osd_scrub_pgs
    'ceph', 'osd', 'scrub', role])
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/remote.py", line 106, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 330, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 326, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.132.34 with status 11: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.2'
2014-05-14T23:12:43.562 INFO:teuthology.misc:Shutting down mds daemons...
2014-05-14T23:12:43.563 DEBUG:teuthology.task.ceph.mds.a:waiting for process to exit
2014-05-14T23:12:44.959 INFO:teuthology.task.ceph.mds.a:Stopped
2014-05-14T23:12:44.959 INFO:teuthology.misc:Shutting down osd daemons...
2014-05-14T23:12:44.960 DEBUG:teuthology.task.ceph.osd.1:waiting for process to exit
2014-05-14T23:12:44.979 INFO:teuthology.task.ceph.osd.1:Stopped
2014-05-14T23:12:44.980 DEBUG:teuthology.task.ceph.osd.0:waiting for process to exit
2014-05-14T23:12:45.039 INFO:teuthology.task.ceph.osd.0:Stopped
2014-05-14T23:12:45.039 DEBUG:teuthology.task.ceph.osd.3:waiting for process to exit
2014-05-14T23:12:45.059 INFO:teuthology.task.ceph.osd.3:Stopped
2014-05-14T23:12:45.059 DEBUG:teuthology.task.ceph.osd.2:waiting for process to exit
2014-05-14T23:12:45.059 ERROR:teuthology.misc:Saw exception from osd.2
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/misc.py", line 1128, in stop_daemons_of_type
    daemon.stop()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/ceph.py", line 57, in stop
    run.wait([self.proc])
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 356, in wait
    proc.exitstatus.get()
  File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 207, in get
    raise self._exception
CommandFailedError: Command failed on 10.214.132.34 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 2'
2014-05-14T23:12:45.071 DEBUG:teuthology.task.ceph.osd.5:waiting for process to exit

archive_path: /var/lib/teuthworker/archive/teuthology-2014-05-14_19:55:03-upgrade:dumpling-x:stress-split-firefly---basic-plana/254850
branch: firefly
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/readwrite.yaml
  6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
email: null
job_id: '254850'
last_in_suite: false
machine_type: plana
name: teuthology-2014-05-14_19:55:03-upgrade:dumpling-x:stress-split-firefly---basic-plana
nuke-on-error: true
os_type: ubuntu
os_version: '14.04'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: a38fe1169b6d2ac98b427334c12d7cf81f809b74
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: a38fe1169b6d2ac98b427334c12d7cf81f809b74
  s3tests:
    branch: master
  workunit:
    sha1: a38fe1169b6d2ac98b427334c12d7cf81f809b74
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
suite: upgrade:dumpling-x:stress-split
targets:
  ubuntu@plana44.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuAuZz+2oyq/3xquKfwZzdK3TBJGelKO4bQ9KIbqRmy2GCqT80FVXC59ynpd7buoY8sqDdEvjF6+E/OowXVg1kIN3uNGntXVMZQc1b89O7i4LkaUVwS4QBT/m5h49nAjxem4Jyq11iNOM06G4NFHZeRHuHZupfz0sj3W0qIB/fBOT7MX48Iwpak7gbvWn1gTzAP42vweSp/9cAamb6IWPiKUqTMoDmFiYCQlKkfhovIjDBgeKsh/9umSi1qYLGrCOpq9ZSgV7OVga/H27odrFIGc4IAXY7t4kLobixODboLSbhQIVkLxW6FQxyWN4MBnJHSeVz+UO8RzigqLUgVFEJ
  ubuntu@plana90.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD4N7cFcxX6jr5S1YNRUVfhy/Zm8RJMn7qF6T1hvbrHokB9LFktHd1AGxBLVVJeedpcWcmx0tiudcoILCW6LCTOv1r6Ne5bOoCeLdJI/DBvezHfzj7e740zBir5IxGuMwV5c9vLn2JPTGQtlLLDjbaqT/7ghFmEuPQeqRDu2BIB2K/46+XFXmiryVss3+cHWW5ApS2p4za7MpKKVPsq5HPvFnNJcAh50AotOJHaTx7BNt7RSMaB4tPt6nOr0mddjG/pA0bDk+r6Xtrrq/zHx4ArcvsGu2wzzHMLmmNVhq6vG88iUPItqdiE588O1CjietucjF6cGbm2QNW/J3tcUB3R
  ubuntu@plana92.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9RMgEoaLLIr68XvX+KvTsR8kcn+XvJEt8tjNTMK8T6TjE3LzajFS8jQTne260jPurEjRRnsICio9Sb6odFuZiiErQB7/p3fa/Rgs1PpJpmQ6HluQrfmiq4OY9t8u/OcvkJiVW9CXLfzKQhZDfV1ifrMz73FI1wD11oC77qMOVFF1jCH5XZ9vHoWd+xOwBRRglAKWeJQJNJcSk+bJFs/i65BuLEDMis8b7FeLPrM+qXCasCh+Aimb0Ny6/+izsrxjsIJ/VGFT24USakTJZY/MAYPHdHhB6cG2xXAJBp1P3npaZZHje+2rk2Co02lmpZjRX+p3btuYNoffultMtX2Xl
tasks:
- internal.lock_machines:
  - 3
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd.sh
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0:
      idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.19245

description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/readwrite.yaml
  6-next-mon/monb.yaml 7-workload/rbd_api.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
duration: 4766.220067024231
failure_reason: timed out waiting for admin_socket to appear after osd.2 restart
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open — 1 closed)