Project

General

Profile

Actions

Bug #5630

closed

osd/ReplicatedPG.cc: 7089: FAILED assert(r >= 0) in scan_range, cuttlefish

Added by Sage Weil almost 11 years ago. Updated almost 11 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

     0> 2013-07-14 21:51:21.215768 7fa5c67da700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::scan_range(hobject_t, int, int, PG::BackfillInterval*)' thread 7fa5c67da700 time 2013-07-14 21:51:21.213145
osd/ReplicatedPG.cc: 7089: FAILED assert(r >= 0)

 ceph version 0.61.4-72-g6af0ed9 (6af0ed9bc4cc955f8c30ad9dc6e9095599f323d0)
 1: (ReplicatedPG::scan_range(hobject_t, int, int, PG::BackfillInterval*)+0xde1) [0x59f251]
 2: (ReplicatedPG::do_scan(std::tr1::shared_ptr<OpRequest>)+0x5d8) [0x59f868]
 3: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x4c2) [0x6b16f2]
 4: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>)+0x323) [0x606be3]
 5: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>)+0x49b) [0x61da4b]
 6: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x31) [0x65a7b1]
 7: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x65aacc]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x83b3e6]
 9: (ThreadPool::WorkThread::entry()+0x10) [0x83d210]
 10: (()+0x7e9a) [0x7fa5d7171e9a]
 11: (clone()+0x6d) [0x7fa5d5304ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2013-07-14_20:00:13-rados-cuttlefish-testing-basic/67131$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: 6af0ed9bc4cc955f8c30ad9dc6e9095599f323d0
  install:
    ceph:
      sha1: 6af0ed9bc4cc955f8c30ad9dc6e9095599f323d0
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: 6af0ed9bc4cc955f8c30ad9dc6e9095599f323d0
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
ubuntu@teuthology:/a/teuthology-2013-07-14_20:00:13-rados-cuttlefish-testi

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #5154: osd/SnapMapper.cc: 270: FAILED assert(check(oid))Resolved05/23/2013

Actions
Actions #1

Updated by Samuel Just almost 11 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Samuel Just almost 11 years ago

core says r was -61 (ENODATA).

Actions #3

Updated by Samuel Just almost 11 years ago

Might indicate a missing xattr on one of the collection subdirs.

Actions #4

Updated by Samuel Just almost 11 years ago

LFNIndex::list_subdirs and LFNIndex::list_objects cannot return -ENODATA (no hashed names)
=> HashIndex::get_path_contents_by_hash cannot return -ENODATA
=> HashIndex::list_by_hash cannot return -ENODATA
=> HashIndex::_collection_list_partial cannot return -ENODATA
=> FileStore::get_index must have returned -ENODATA

Actions #5

Updated by Samuel Just almost 11 years ago

  • Status changed from New to Need More Info
Actions #6

Updated by Samuel Just almost 11 years ago

  • Assignee deleted (Samuel Just)
Actions #7

Updated by Samuel Just almost 11 years ago

/a/dzafman-5154/70692

Actions #8

Updated by Samuel Just almost 11 years ago

My earlier comments were incorrect, the assert was after a getattr. Also, the second set of logs shows that it's 5154 again.

Actions #9

Updated by Samuel Just almost 11 years ago

  • Status changed from Need More Info to Duplicate
Actions

Also available in: Atom PDF