Project

General

Profile

Actions

Bug #9472

closed

osd crash in -upgrade:dumpling-dumpling-distro-basic-vps suite

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-14_17:00:01-upgrade:dumpling-dumpling-distro-basic-vps/483716

coredump in */ceph-osd.0.log.gz :

ceph-osd.0.log.gz:     0> 2014-09-15 01:27:03.177613 7f2717c0c700 -1 *** Caught signal (Aborted) **
ceph-osd.0.log.gz: in thread 7f2717c0c700
ceph-osd.0.log.gz:
ceph-osd.0.log.gz: ceph version 0.67.10-12-gd3e880a (d3e880af5f3ae71d13159514c33c6b41fc648d54)
ceph-osd.0.log.gz: 1: ceph-osd() [0x7fd46a]
ceph-osd.0.log.gz: 2: (()+0xfcb0) [0x7f272c0f3cb0]
ceph-osd.0.log.gz: 3: (gsignal()+0x35) [0x7f272a31d0d5]
ceph-osd.0.log.gz: 4: (abort()+0x17b) [0x7f272a32083b]
ceph-osd.0.log.gz: 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f272ac6f69d]
ceph-osd.0.log.gz: 6: (()+0xb5846) [0x7f272ac6d846]
ceph-osd.0.log.gz: 7: (()+0xb5873) [0x7f272ac6d873]
ceph-osd.0.log.gz: 8: (()+0xb596e) [0x7f272ac6d96e]
ceph-osd.0.log.gz: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x8c696f]
ceph-osd.0.log.gz: 10: (Watch::get_delayed_cb()+0xc3) [0x6d3dc3]
ceph-osd.0.log.gz: 11: (ReplicatedPG::handle_watch_timeout(std::tr1::shared_ptr<Watch>)+0x9e9) [0x5e5029]
ceph-osd.0.log.gz: 12: (ReplicatedPG::check_blacklisted_obc_watchers(ObjectContext*)+0x3ba) [0x5e569a]
ceph-osd.0.log.gz: 13: (ReplicatedPG::populate_obc_watchers(ObjectContext*)+0x60b) [0x5e5f7b]
ceph-osd.0.log.gz: 14: (ReplicatedPG::get_object_context(hobject_t const&, bool)+0x1b1) [0x5e6911]
ceph-osd.0.log.gz: 15: (ReplicatedPG::prep_object_replica_pushes(hobject_t const&, eversion_t, int, std::map<int, std::vector<PushOp, std::allocator<PushOp> >, std::less<int>, std::allocator<std::pair<int const, std::vector<PushOp, std::allocator<PushOp> > > > >*)+0x10e) [0x5f6b4e]
ceph-osd.0.log.gz: 16: (ReplicatedPG::wait_for_degraded_object(hobject_t const&, std::tr1::shared_ptr<OpRequest>)+0x1cc) [0x5f874c]
ceph-osd.0.log.gz: 17: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0xb80) [0x60a830]
ceph-osd.0.log.gz: 18: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x619) [0x6feb19]
ceph-osd.0.log.gz: 19: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x330) [0x651ad0]
ceph-osd.0.log.gz: 20: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x478) [0x6688b8]
ceph-osd.0.log.gz: 21: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6a405c]
ceph-osd.0.log.gz: 22: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b6f76]
ceph-osd.0.log.gz: 23: (ThreadPool::WorkThread::entry()+0x10) [0x8b8f90]
ceph-osd.0.log.gz: 24: (()+0x7e9a) [0x7f272c0ebe9a]
ceph-osd.0.log.gz: 25: (clone()+0x6d) [0x7f272a3db31d]
ceph-osd.0.log.gz: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-09-14T18:29:27.309 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 113, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_dumpling/tasks/thrashosds.py", line 167, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_dumpling/tasks/ceph_manager.py", line 106, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.0 restart
2014-09-14T18:29:27.372 DEBUG:teuthology.run_tasks:Unwinding manager install.upgrade
2014-09-14T18:29:27.373 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2014-09-14T18:29:27.373 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 29, in nested
    yield vars
  File "/var/lib/teuthworker/src/ceph-qa-suite_dumpling/tasks/ceph.py", line 1037, in task
    yield
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 113, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_dumpling/tasks/thrashosds.py", line 167, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_dumpling/tasks/ceph_manager.py", line 106, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.0 restart
archive_path: /var/lib/teuthworker/archive/teuthology-2014-09-14_17:00:01-upgrade:dumpling-dumpling-distro-basic-vps/483716
branch: dumpling
description: upgrade:dumpling/rbd/{0-cluster/start.yaml 1-dumpling-install/v0.67.5.yaml
  2-workload/rbd.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final/osdthrash.yaml}
email: ceph-qa@ceph.com
job_id: '483716'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-09-14_17:00:01-upgrade:dumpling-dumpling-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: dumpling
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    fs: xfs
    log-whitelist:
    - slow request
    - scrub
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: d3e880af5f3ae71d13159514c33c6b41fc648d54
  ceph-deploy:
    branch:
      dev: dumpling
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: d3e880af5f3ae71d13159514c33c6b41fc648d54
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: dumpling
    idle_timeout: 1200
  workunit:
    sha1: d3e880af5f3ae71d13159514c33c6b41fc648d54
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mon.c
  - osd.3
  - osd.4
  - osd.5
  - client.0
suite: upgrade:dumpling
suite_branch: dumpling
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_dumpling
targets:
  ubuntu@vpm011.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYxQt0awmM/ra9iOAMOSIbmHfkJjdplDBy993PmC6+LPd5ZDEqWbudP64zsw8NOCNlDOcSvZUQ1kUi/r864+jhB4x0jblef5ChKodeWUUnBWOpeRi2ntKTia+/DFovFhdqWVRYwytsIXM8OPB/VMNcEW7jxwHve02U8N+0lln1QrGZL58MG/Jh1xVPb+m2qzJh7drzbag/+KLcY7a3hq6y25FPN0kJKWTWNApZbvoiHfUx0Pdn4edQXuD/hGGi8q6aGid+ll8jtmQal1nlKSiALMp3ova8dVmi5vPR9FfMpmUz5zIlBfWHbhqPUJl86Wz+MyZ7fnsV2/HuIkYYhS2p
  ubuntu@vpm185.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHB/Nn14xx+YmseC626Wykock1kcEsM4CihNrrIjrSJ41ddJ2B2MiSlVCt/j0smQH6e1nfzFR5jxT9m95E3jKDp+GOEO/whs5I2Yv3RBM3x9LIVflFBJvkZc0djZ2B+M8XknmtS4juEdY7U5znl/6gQmPWf2t/UsvjONz8+fMoVqUqe/yQ8tUXyP6mskiyES1SqGR/VDO80aD8thiEDtOM1pfHdGhok5OU7GUNyCk+eokuDjQy9Yn0jNPVVYkKnC3HZQCyN3ojhwOt7U3nP1pocOkpImE0XRD14gtYSgJ37SfFs0CnYdRpowpKuWkR/KTk+mGhiejRX1tir8WL5nF5
tasks:
- internal.lock_machines:
  - 2
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    tag: v0.67.5
- ceph: null
- install.upgrade:
    all:
      branch: dumpling
- parallel:
  - workload
  - upgrade-sequence
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- workunit:
    clients:
      client.0:
      - rbd/test_lock_fence.sh
teuthology_branch: master
tube: vps
upgrade-sequence:
  sequential:
  - ceph.restart:
    - mon.a
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.b
  - sleep:
      duration: 60
  - ceph.restart:
    - mon.c
  - sleep:
      duration: 60
  - ceph.restart:
    - mds.a
  - sleep:
      duration: 60
  - ceph.restart:
    - osd.0
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.1
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.2
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.3
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.4
  - sleep:
      duration: 30
  - ceph.restart:
    - osd.5
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.4023
workload:
  sequential:
  - workunit:
      clients:
        client.0:
        - rbd/import_export.sh
      env:
        RBD_CREATE_ARGS: --new-format
  - workunit:
      clients:
        client.0:
        - cls/test_cls_rbd.sh

description: upgrade:dumpling/rbd/{0-cluster/start.yaml 1-dumpling-install/v0.67.5.yaml
2-workload/rbd.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final/osdthrash.yaml}
duration: 1341.5205171108246
failure_reason: timed out waiting for admin_socket to appear after osd.0 restart
flavor: basic
owner: scheduled_teuthology@teuthology
success: false


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #8315: osd: watch callback vs callback funkyResolvedSamuel Just05/08/2014

Actions
Actions #1

Updated by Samuel Just over 9 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF