Project

General

Profile

Actions

Bug #8393

closed

osd crashed in rbd-master-testing-basic-plana suite

Added by Yuri Weinstein almost 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-05-18_23:00:04-rbd-master-testing-basic-plana/261772/

2014-05-19T07:47:52.190 INFO:teuthology.orchestra.run.plana59:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.4'
2014-05-19T07:47:52.457 INFO:teuthology.orchestra.run.plana59.stderr:Error EAGAIN: osd.4 is not up
2014-05-19T07:47:52.468 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-master/teuthology/contextutil.py", line 29, in nested
    yield vars
  File "/home/teuthworker/teuthology-master/teuthology/task/ceph.py", line 1458, in task
    osd_scrub_pgs(ctx, config)
  File "/home/teuthworker/teuthology-master/teuthology/task/ceph.py", line 1090, in osd_scrub_pgs
    'ceph', 'osd', 'scrub', role])
  File "/home/teuthworker/teuthology-master/teuthology/orchestra/remote.py", line 114, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 385, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/teuthworker/teuthology-master/teuthology/orchestra/run.py", line 381, in _check_status
    command=r.command, exitstatus=status, node=name)
CommandFailedError: Command failed on plana59 with status 11: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.4'
archive_path: /var/lib/teuthworker/archive/teuthology-2014-05-18_23:00:04-rbd-master-testing-basic-plana/261772
branch: master
description: rbd/librbd/{cache/none.yaml cachepool/small.yaml clusters/fixed-3.yaml
  fs/btrfs.yaml msgr-failures/few.yaml workloads/qemu_xfstests.yaml}
email: null
exclude_arch: armv7l
job_id: '261772'
kernel: &id001
  kdb: true
  sha1: 335cb91ce950ce0e12294af671c64a468d89194c
last_in_suite: false
machine_type: plana
name: teuthology-2014-05-18_23:00:04-rbd-master-testing-basic-plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
        osd op thread timeout: 60
        osd sloppy crc: true
    fs: btrfs
    log-whitelist:
    - slow request
    - wrongly marked me down
    sha1: 991f7f15a6e107b33a24bbef1169f21eb7fcce2c
  ceph-deploy:
    branch:
      dev: master
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 991f7f15a6e107b33a24bbef1169f21eb7fcce2c
  s3tests:
    branch: master
  workunit:
    sha1: 991f7f15a6e107b33a24bbef1169f21eb7fcce2c
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
suite: rbd
targets:
  ubuntu@plana32.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDmERfjlurKX631Ys98uSfL1mMJkRRZRRV5Hhen56sub04bFDz7W9zjh3Zs9pNMfdc1dWLf8IcpbdfcbR7cmkyfxQlLl+KmCwvRED+ZCR8P5HlkMFb+HnTdvyLAbu/4pvQRxrjy1GyQdNRUpxA8WWbfHrlz8leZPz3u3+hsHaCt8W0Y8cBpqmdTUtSgaGa9JTo/GWSkavF81o5xuVD+A4TGwNwTqIbb1f/HXAytffUwKr5fHHs1+hm1aT9GzQSumDHVCf9ykbcvO2uR70JZl3lZW2pVeFwQmq0AwmD5SetofuQK4ykVweONstnPwNGBqZJ/1A8jbxcby94RhDztzTqb
  ubuntu@plana59.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDaHG83zRXo6ydv6IGWDFTf6YNjWG9M5LRbbYIpPXKOqCg9zfI/4ZjymLpznESFIACVrqe06jqD7uvsQPOlbcm3W/H44su70C21KrzMs77IpskMT7tYgCzY75uxbwg949qYIRf1SEY2RW0Bf2zldbOeKAY/TcnGIkLtc4NCIDPfCxMG0rAJJgUAwbvbKVUqLKe/jcyu3RiiAxV3TGjTAzTz+XHwT46gDXB5Fxt49Sfx+AgpILHk7DvN/HILtU3gRT9ac0D2WlQi1sJLDgjeTAZxyfpRR5iZH4tWYBFIS7C4ugHYye95zUYTc/3Jt364Jl/giUherGjE5od7p65VjxRJ
  ubuntu@plana81.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDlJFZzGyQQUsAwKLFwKdofbzaj9hARIwmyH7nJWNlnwI++pPIPaKpFfnoOZnByS9xjb7Mmm3/Kg8YVg0diIAfapqpM/wTNPSyeYsjChLZDEJdPpo/0YlBxiEiy1655kOOdnPz67mr5YnNQJw1un16SRPVaXkMX4USSwrf0wOj685m7vo2dQOUzcfrC8liIrqBd/OU1LnRPtOdieTlJTOHyxNVW8ge8Q+y///lF7+pXkaLK743CZSU4PV/5P6NsahCD584A86Ucf1++F1w2BcNOBmLRplNIvoOycnCHeLfyzYfwmU1bY6vI4Kqjg2UmJ0lcJLKUu2FH1ert8+D+ufUL
tasks:
- internal.lock_machines:
  - 3
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install: null
- ceph:
    conf:
      client:
        rbd cache: false
- exec:
    client.0:
    - ceph osd pool create cache 4
    - ceph osd tier add rbd cache
    - ceph osd tier cache-mode cache writeback
    - ceph osd tier set-overlay rbd cache
    - ceph osd pool set cache hit_set_type bloom
    - ceph osd pool set cache hit_set_count 8
    - ceph osd pool set cache hit_set_period 60
    - ceph osd pool set cache target_max_objects 250
- qemu:
    all:
      num_rbd: 2
      test: https://ceph.com/git/?p=ceph.git;a=blob_plain;f=qa/run_xfstests_qemu.sh
      type: block
teuthology_branch: master
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.19208
client.0-kernel-sha1: 335cb91ce950ce0e12294af671c64a468d89194c
description: rbd/librbd/{cache/none.yaml cachepool/small.yaml clusters/fixed-3.yaml
  fs/btrfs.yaml msgr-failures/few.yaml workloads/qemu_xfstests.yaml}
duration: 25796.29017996788
failure_reason: 'Command failed on plana59 with status 1: ''sudo adjust-ulimits ceph-coverage
  /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 4'''
flavor: basic
mon.a-kernel-sha1: 335cb91ce950ce0e12294af671c64a468d89194c
mon.b-kernel-sha1: 335cb91ce950ce0e12294af671c64a468d89194c
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #8380: osd/ReplicatedPG.cc: 1849: FAILED assert(0 == "out of order op")Resolved05/17/2014

Actions
Actions #1

Updated by Yuri Weinstein almost 10 years ago

  • Project changed from teuthology to Ceph
  • Severity changed from 3 - minor to 2 - major

Moved to ceph

Actions #2

Updated by Yuri Weinstein almost 10 years ago

I was able to reproduce this issue on manual re-run.
Logs can be found on 'yw' box (ssh from teuthology) in /home/ubuntu/logs/261772

Error on osd.1 crash:

2014-05-19T15:11:12.965 INFO:teuthology.orchestra.run.plana70.stderr:osd.0 instructed to scrub
2014-05-19T15:11:12.976 INFO:teuthology.task.ceph:Scrubbing osd osd.1
2014-05-19T15:11:12.977 INFO:teuthology.orchestra.run.plana70:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.1'
2014-05-19T15:11:13.236 INFO:teuthology.orchestra.run.plana70.stderr:Error EAGAIN: osd.1 is not up
2014-05-19T15:11:13.248 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/ubuntu/bkup/teuthology/teuthology/contextutil.py", line 29, in nested
    yield vars
  File "/home/ubuntu/bkup/teuthology/teuthology/task/ceph.py", line 1458, in task
    osd_scrub_pgs(ctx, config)
  File "/home/ubuntu/bkup/teuthology/teuthology/task/ceph.py", line 1090, in osd_scrub_pgs
    'ceph', 'osd', 'scrub', role])
  File "/home/ubuntu/bkup/teuthology/teuthology/orchestra/remote.py", line 114, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/ubuntu/bkup/teuthology/teuthology/orchestra/run.py", line 385, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/ubuntu/bkup/teuthology/teuthology/orchestra/run.py", line 381, in _check_status
    command=r.command, exitstatus=status, node=name)
CommandFailedError: Command failed on plana70 with status 11: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd scrub osd.1'
2014-05-19T15:11:13.250 INFO:teuthology.misc:Shutting down mds daemons...
Actions #3

Updated by Yuri Weinstein almost 10 years ago

coredump from ceph-osd.4.log @ ubuntu@teuthology:/a/teuthology-2014-05-18_23:00:04-rbd-master-testing-basic-plana/261772/remote/plana59/log$

     0> 2014-05-19 07:31:11.905597 7fe3c0d43700 -1 *** Caught signal (Aborted) **
 in thread 7fe3c0d43700

 ceph version 0.80-469-g991f7f1 (991f7f15a6e107b33a24bbef1169f21eb7fcce2c)
 1: ceph-osd() [0x99479a]
 2: (()+0xfcb0) [0x7fe3d8df8cb0]
 3: (gsignal()+0x35) [0x7fe3d72f3425]
 4: (abort()+0x17b) [0x7fe3d72f6b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe3d7c4669d]
 6: (()+0xb5846) [0x7fe3d7c44846]
 7: (()+0xb5873) [0x7fe3d7c44873]
 8: (()+0xb596e) [0x7fe3d7c4496e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0xa75c3f]
 10: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1dbf) [0x820fbf]
 11: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x2d15) [0x82ab15]
 12: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x692) [0x7c6142]
 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1ca) [0x651e0a]
 14: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x628) [0x6528b8]
 15: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6831fc]
 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa66456]
 17: (ThreadPool::WorkThread::entry()+0x10) [0xa68260]
 18: (()+0x7e9a) [0x7fe3d8df0e9a]
 19: (clone()+0x6d) [0x7fe3d73b13fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #4

Updated by Sage Weil almost 10 years ago

  • Status changed from New to Duplicate

dup of #8380

Actions

Also available in: Atom PDF