Project

General

Profile

Actions

Bug #5677

closed

osd/OSD.cc: 5517: FAILED assert(_get_map_bl(epoch, bl))

Added by Sage Weil almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
cuttlefish
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

    -2> 2013-07-19 03:36:48.647677 7f9a4e55c700  1 -- 10.214.133.28:6804/3750 <== mon.0 10.214.132.36:6789/0 11 ==== osd_map(2090..2090 src has 2090..2839) v3 ==== 5285+0+0 (1231217973 0 0) 0x230a240 con 0x1c3b2c0
    -1> 2013-07-19 03:36:48.647713 7f9a4e55c700  3 osd.0 1807 handle_osd_map epochs [2090,2090], i have 1807, src has [2090,2839]
     0> 2013-07-19 03:36:48.650527 7f9a4e55c700 -1 osd/OSD.cc: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9a4e55c700 time 2013-07-19 03:36:48.648065
osd/OSD.cc: 5517: FAILED assert(_get_map_bl(epoch, bl))

 ceph version 0.66-712-gc9ba933 (c9ba933b0b2fdb012ccf8a8535d09381c943144d)
 1: (OSDService::get_map(unsigned int)+0x428) [0x694978]
 2: (OSDService::init_splits_between(pg_t, std::tr1::shared_ptr<OSDMap const>, std::tr1::shared_ptr<OSDMap const>)+0x1c9) [0x69c319]
 3: (OSD::consume_map()+0x5ec) [0x69cc0c]
 4: (OSD::handle_osd_map(MOSDMap*)+0x101f) [0x6a8aaf]
 5: (OSD::_dispatch(Message*)+0x2fb) [0x6ab48b]
 6: (OSD::ms_dispatch(Message*)+0x1d6) [0x6abb96]
 7: (DispatchQueue::entry()+0x549) [0x9898f9]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8adedd]
 9: (()+0x7e9a) [0x7f9a5aeebe9a]
 10: (clone()+0x6d) [0x7f9a5907eccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-07-19_01:00:14-rados-next-testing-basic/72838$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 77c8bf2f972a9d6ff446c49a41678bf931bbee44
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject internal delays: 0.002
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: c9ba933b0b2fdb012ccf8a8535d09381c943144d
  ceph-deploy:
    conf:
      client:
        debug monc: 20
        debug ms: 1
        debug objecter: 20
        debug rados: 20
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: c9ba933b0b2fdb012ccf8a8535d09381c943144d
  s3tests:
    branch: next
  workunit:
    sha1: c9ba933b0b2fdb012ccf8a8535d09381c943144d
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: next

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #6430: OSD crashes on startup in function 'OSDMapRef OSDService::get_map(epoch_t)DuplicateSamuel Just09/27/2013

Actions
Actions #1

Updated by Ian Colle almost 11 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Sage Weil almost 11 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Samuel Just almost 11 years ago

Fix merged, 6951d2345a5d837c3b14103bd4d8f5ee4407c937, still working on getting the test to be reliable so I can add it to the suite.

Actions #4

Updated by Samuel Just almost 11 years ago

  • Status changed from In Progress to Fix Under Review

Added wip-5677 to ceph-qa-suite and teuthology gits.

Actions #5

Updated by Sage Weil almost 11 years ago

Samuel Just wrote:

Added wip-5677 to ceph-qa-suite and teuthology gits.

for the teuthology.git change, let's have a non-zero probability of running this or else we'll forget to run it when it is important to do so. also, let's set the mon min osdmap epochs = 25 or something similarly small in the ceph.conf.template?

with that there probably isn't a need make any qa suite changes?

Actions #6

Updated by Samuel Just almost 11 years ago

  • Status changed from Fix Under Review to Resolved
Actions #7

Updated by Ian Colle over 10 years ago

  • Backport set to cuttlefish, dumpling
Actions #8

Updated by Ian Colle over 10 years ago

  • Backport changed from cuttlefish, dumpling to cuttlefish
Actions

Also available in: Atom PDF