Project

General

Profile

Actions

Bug #6400

closed

osd crashed in dumpling due to unexpected error (EEXIST?)

Added by Tamilarasi muthamizhan over 10 years ago. Updated over 10 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

logs: ubuntu@teuthology:/a/teuthology-2013-09-25_01:35:02-upgrade-small-next-testing-basic-vps/17000

2013-09-25 06:52:05.289073 7f3cb3bd8700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)' thread 7f3cb3bd8700 time 2013-09-25 06:52:04.184366
os/FileStore.cc: 2816: FAILED assert(0 == "unexpected error")

 ceph version 0.67.3-32-g94548b4 (94548b4b67cca37366c7d8719209a6d2e7956811)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int)+0xc50) [0x7ba0c0]
 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x71) [0x7c0cb1]
 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x274) [0x7c0f44]
 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x8f5f41]
 5: (ThreadPool::WorkThread::entry()+0x10) [0x8f8f70]
 6: (()+0x7851) [0x7f3cbfb8b851]
 7: (clone()+0x6d) [0x7f3cbe45294d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

2013-09-25 06:52:05.300705 7f3cb6ddd700  1 journal check_for_full at 69533696 : JOURNAL FULL 69533696 >= 5058559 (max_size 104857600 start 74592256)
2013-09-25 06:52:05.327396 7f3cb8be0700  5 osd.0 14 tick

ubuntu@teuthology:/a/teuthology-2013-09-25_01:35:02-upgrade-small-next-testing-basic-vps/17000$ cat config.yaml 
archive_path: /var/lib/teuthworker/archive/teuthology-2013-09-25_01:35:02-upgrade-small-next-testing-basic-vps/17000
description: upgrade-small/rados/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-workload/loadgenbig.yaml 3-upgrade-sequence/upgrade.yaml 4-restart/restart.yaml
  5-next-workload/next.yaml distro/centos_6.4.yaml}
email: null
job_id: 17000
kernel:
  kdb: true
  sha1: 4c3a1682ce22b9b734087e5a96586e9570aec185
last_in_suite: false
machine_type: vps
name: teuthology-2013-09-25_01:35:02-upgrade-small-next-testing-basic-vps
nuke-on-error: true
os_type: centos
os_version: '6.4'
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug ms: 1
        debug osd: 5
    log-whitelist:
    - slow request
    sha1: 3de32562b55c6ece3a6ed783c36f8b9f21460339
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: 3de32562b55c6ece3a6ed783c36f8b9f21460339
  s3tests:
    branch: next
  workunit:
    sha1: 3de32562b55c6ece3a6ed783c36f8b9f21460339
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
targets:
  ubuntu@vpm016.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAs6hB+ercE0EEYIzgx+boB7DS4ylKndFtwioYi9f9y3dD5mhAdzH/81fmPPfCyhAafCzQ1HRLyIIqiAUg8nU1v8FKxL+jUDkuLUjdKe1a8Wf1Qb6OgeSd4PjmJd0jK10BqnVWc+eJYQ+sUXfBaLERNEeSpivA/amTV7NzOy2ELc/O+32YeOJAiPWHf5Q2GdH8BVzpZ34wGkrBcOIwee6459UYsiPRTfHpjDl+WLQvRRBeCymBmS8AIl0W1EOZhh+IGALuv1AWJBp/CeAK0Ml8Hy5Zm3O87o6Z6FfAmTRCUZ/8HTG+7GJTej0B8HnCs8qAtXkrUl8oXn5IoTa0rNHRfw==
  ubuntu@vpm017.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAv8MXBY4n+IZQFZo/5MDs+/9SYW0ocNUu/lh2fpMxrr7VW8lrZ97BxPpfPeMeWYqLTEMrSBEt+74yCdp7QpOvkYoWPcECYz9PwVsQEVVBbfOgR6IEsftsygHN0Iuug1Yw3OAUXL4iQdHO+HkfP2mhBTugMDsmdUPtEHwoVeCmLYnX+Vaj+drP/Ebk02opmsHQl/j6APoyKshbXqQ9l37Mt/Moc5nwTBWrIIvg5PvlWhhMoa8JNTSwqPkutaHzu8MNzBksPJ1VrahnXaQVlVS/E20vHI/g6IvlJKp5jHwIFCo7HvsHEUCLQXe7gQzeacU7WH3BgZV/rTc68UY8xk/ztw==
  ubuntu@vpm018.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA3orZTx4/sJx9mNCB+dhTT+znZMLjWvuQBI/qdj80ZOQ8SrdiDRBVM0Hlc/bbxDgUdx24Xd/2A3p9WXa/Bv1lHpEX8vqYZTicpl7wlejK1fS3MNWPvG3PM4a+qMK65/0k9M1qoX5iiqQreplClBYON+Gw5krdZEii0pQbRc/eyz6D/Sr9IN64g1ygOWQMgJ7+Ya8ASpYyZtYEL481O0qq1rOJOYUuBQJSkhWd2/7fX3Y67U1eKMSWlbGKwox+PAvZVOFsaMSgL0PPdKlNHZF7eTJ5erAowm+PwQtwQNTBOPLiMKEFkZLgkMQaO/mnL1hbD/oJbhx6uVjmyUkGubq8pQ==
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph: null
- workunit:
    branch: dumpling
    clients:
      all:
      - rados/load-gen-big.sh
- install.upgrade:
    all:
      branch: next
- ceph.restart:
  - mon.a
  - mon.b
  - mon.c
  - mds.a
  - osd.0
  - osd.1
  - osd.2
  - osd.3
- workunit:
    branch: next
    clients:
      client.0:
      - rados/test.sh
teuthology_branch: next
verbose: true


Files

osd.0.log (3.49 KB) osd.0.log Tengwei Cai, 10/29/2013 01:42 AM
Actions #1

Updated by Tamilarasi muthamizhan over 10 years ago

  • Priority changed from High to Urgent
Actions #2

Updated by Ian Colle over 10 years ago

  • Assignee set to Samuel Just
Actions #3

Updated by Samuel Just over 10 years ago

  • Subject changed from osd crashed in dumpling at JOURNAL_FULL to osd crashed in dumpling due to unexpected error (EEXIST?)

This isn't due to the JOURNAL_FULL thing, that's just noise. The assert appears to be the one which triggers when we got a non-whitelisted error. My money is on EEXIST.

Actions #4

Updated by Samuel Just over 10 years ago

  • Status changed from New to Duplicate

No logs, probably dup of 5951.

Actions #5

Updated by Tengwei Cai over 10 years ago

I can reproduce this issue on CentOS 6.4 machine every time.

My steps are:

  1. Build Ceph with dumpling release code
  2. Start a develop cluster with "vstart.sh -d -n mon osd"
  3. Start rados bench to write some data into Ceph

After wrote about 70M data, ceph-osd crashed and print very similar logs.

Actions

Also available in: Atom PDF