Project

General

Profile

Actions

Bug #5799

closed

SIGABRT in build_push_op -> object_info_t::decode

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

     0> 2013-07-29 12:00:59.227297 7f228c4c9700 -1 *** Caught signal (Aborted) **
 in thread 7f228c4c9700

 ceph version 0.67-rc2-106-g12c1f11 (12c1f1157c7b9513a3d9f716a8ec62fce00d28f5)
 1: ceph-osd() [0x80248a]
 2: (()+0xfcb0) [0x7f229fb88cb0]
 3: (gsignal()+0x35) [0x7f229dc56425]
 4: (abort()+0x17b) [0x7f229dc59b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f229e5a869d]
 6: (()+0xb5846) [0x7f229e5a6846]
 7: (()+0xb5873) [0x7f229e5a6873]
 8: (()+0xb596e) [0x7f229e5a696e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x8ca067]
 10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x73) [0x95e483]
 11: (ReplicatedPG::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*)+0x87f) [0x5fe1cf]
 12: (ReplicatedPG::handle_pull(int, PullOp&, PushOp*)+0xc1) [0x6015a1]
 13: (ReplicatedPG::do_pull(std::tr1::shared_ptr<OpRequest>)+0x4f4) [0x602344]
 14: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x348) [0x710678]
 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>)+0x323) [0x664423]
 16: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>)+0x49b) [0x67ab2b]
 17: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x31) [0x6b68d1]
 18: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6b6c3c]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8bbed6]
 20: (ThreadPool::WorkThread::entry()+0x10) [0x8bdd70]
 21: (()+0x7e9a) [0x7f229fb80e9a]
 22: (clone()+0x6d) [0x7f229dd13ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job wsa
ubuntu@teuthology:/a/teuthology-2013-07-29_09:44:34-rados-next-testing-basic-plana/88588$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 88b7f22bc0e44db48a24af23e4de3653bc44b2d2
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject internal delays: 0.002
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
    fs: ext4
    log-whitelist:
    - slow request
    sha1: 12c1f1157c7b9513a3d9f716a8ec62fce00d28f5
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: 12c1f1157c7b9513a3d9f716a8ec62fce00d28f5
  s3tests:
    branch: next
  workunit:
    sha1: 12c1f1157c7b9513a3d9f716a8ec62fce00d28f5
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 2
    chance_pgpnum_fix: 1
    timeout: 1200
- radosbench:
    clients:
    - client.0
    time: 1800
teuthology_branch: next


Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #5873: osd: unfound object from thrashing when all osds are upDuplicateSamuel Just08/04/2013

Actions
Has duplicate Ceph - Bug #5749: osd: unfound objects on cuttlefishDuplicateSamuel Just07/25/2013

Actions
Actions #1

Updated by Samuel Just over 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Samuel Just
Actions #2

Updated by Samuel Just over 10 years ago

I think this may have been an ext4 xattr error. Either way, we'll have to reproduce it.

Actions #3

Updated by Samuel Just over 10 years ago

Scheduled run: samuelj-5799-0

Job scheduled with ID 92924
Job scheduled with ID 92925
Job scheduled with ID 92926
Job scheduled with ID 92927
Job scheduled with ID 92928

Actions #4

Updated by Samuel Just over 10 years ago

  • Status changed from In Progress to 7

Just kidding, ext4 just happens to be slow enough to trigger it. wip-5799. ubuntu@teuthology:/a/samuelj-5799-1/92938

Actions #5

Updated by Sage Weil over 10 years ago

  • Status changed from 7 to 15
Actions #6

Updated by Sage Weil over 10 years ago

  • Status changed from 15 to Fix Under Review
Actions #7

Updated by Samuel Just over 10 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF