Project

General

Profile

Actions

Bug #5631

closed

osd/ReplicatedPG.cc: 3036: FAILED assert(iter)

Added by Sage Weil almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

     0> 2013-07-15 02:19:36.389077 7f20138b8700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp>&)' thread 7f20138b8700 time 2013-07-15 02:19:36.387113
osd/ReplicatedPG.cc: 3036: FAILED assert(iter)

 ceph version 0.66-588-g9baa668 (9baa66801ab02854c344eb2fd1a8da8c5806125b)
 1: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x92ab) [0x61550b]
 2: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6f) [0x61781f]
 3: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3590) [0x61fb40]
 4: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x619) [0x70b189]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>)+0x323) [0x662373]
 6: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>)+0x49b) [0x67855b]
 7: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x31) [0x6b2ec1]
 8: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6b322c]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b8576]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x8ba3a0]
 11: (()+0x7e9a) [0x7f2026253e9a]
 12: (clone()+0x6d) [0x7f20243e6ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was
ubuntu@teuthology:/a/teuthology-2013-07-15_01:00:16-rados-next-testing-basic/67493$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: 9baa66801ab02854c344eb2fd1a8da8c5806125b
  install:
    ceph:
      sha1: 9baa66801ab02854c344eb2fd1a8da8c5806125b
  s3tests:
    branch: next
  workunit:
    sha1: 9baa66801ab02854c344eb2fd1a8da8c5806125b
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #5269: osd: EEXIST on mkcollResolved06/06/2013

Actions
Actions #1

Updated by Samuel Just almost 11 years ago

  • Status changed from New to In Progress
  • Assignee set to Samuel Just
Actions #2

Updated by Samuel Just almost 11 years ago

  • Status changed from In Progress to 7

get_omap_iterator relies on lfn_find, while getattr relies on lfn_open. The latter might return attrs from an hobject_t residing in a parent collection if the 5269 bug occurred leaving get_omap_iterator to fail to find an object at the correct path. This one is probably also due to the 5269 bug.

Actions #3

Updated by Samuel Just almost 11 years ago

  • Assignee deleted (Samuel Just)
Actions #4

Updated by Samuel Just almost 11 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF