Project

General

Profile

Actions

Bug #4806

closed

os/FileStore.cc: In function 'void FileStore::_set_replay_guard() failure

Added by Sage Weil almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

    -8> 2013-04-24 12:58:29.297190 7f6ca6373780  3 journal journal_replay: applying op seq 9691
    -7> 2013-04-24 12:58:29.297385 7f6ca6373780  3 journal journal_replay: r = 0, op_seq now 9691
    -6> 2013-04-24 12:58:29.297407 7f6ca6373780  2 journal read_entry 33112064 : seq 9692 742 bytes
    -5> 2013-04-24 12:58:29.297411 7f6ca6373780  3 journal journal_replay: applying op seq 9692
    -4> 2013-04-24 12:58:29.297566 7f6ca6373780  3 journal journal_replay: r = 0, op_seq now 9692
    -3> 2013-04-24 12:58:29.297601 7f6ca6373780  2 journal read_entry 33120256 : seq 9693 7838 bytes
    -2> 2013-04-24 12:58:29.297604 7f6ca6373780  3 journal journal_replay: applying op seq 9693
    -1> 2013-04-24 12:58:29.417308 7f6ca6373780 -1 filestore(/var/lib/ceph/osd/ceph-3) _set_replay_guard 3.19_head error -1
     0> 2013-04-24 12:58:29.420332 7f6ca6373780 -1 os/FileStore.cc: In function 'void FileStore::_set_replay_guard(coll_t, const SequencerPosition&, bool)' thread 7f6ca6373780 time 2013-04-24 12:58:29.417331
os/FileStore.cc: 2167: FAILED assert(0 == "_set_replay_guard failed")

 ceph version 0.60-644-ga9791da (a9791dae1b64cab4eca6e70d165d8a51eafc86cc)
 1: (FileStore::_set_replay_guard(coll_t, SequencerPosition const&, bool)+0x1ef) [0x73169f]
 2: (FileStore::_split_collection(coll_t, unsigned int, unsigned int, coll_t, SequencerPosition const&)+0x21a) [0x733ffa]
 3: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int)+0x3e3) [0x74a153]
 4: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x71) [0x750831]
 5: (JournalingObjectStore::journal_replay(unsigned long)+0x8be) [0x75f89e]
 6: (FileStore::mount()+0x3974) [0x743c54]
 7: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x60b47a]
 8: (OSD::convertfs(std::string const&, std::string const&)+0x47) [0x60bee7]
 9: (main()+0x2239) [0x57eb59]
 10: (__libc_start_main()+0xed) [0x7f6ca407b76d]
 11: ceph-osd() [0x58131d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was
ubuntu@teuthology:/a/sage-e2/371$ cat orig.config.yaml 
kernel:
  branch: testing
  kdb: true
nuke-on-error: true
overrides:
  ceph:
    branch: wip-4748-b
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
    fs: ext4
    log-whitelist:
    - slow request
  s3tests:
    branch: next
  workunit:
    branch: next
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000

Actions #1

Updated by Samuel Just almost 11 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Samuel Just almost 11 years ago

  • Status changed from 12 to Pending Backport
  • Priority changed from Urgent to High
Actions #3

Updated by Sage Weil almost 11 years ago

  • Status changed from Pending Backport to Resolved

non-trivial to backport; split ppl should upgrade to cuttlefish.

Actions

Also available in: Atom PDF