Project

General

Profile

Actions

Bug #5270

closed

osd: crash in PG::peek_map_epoch()

Added by Sage Weil almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

    -3> 2013-06-06 22:11:43.947436 7f69f8cda780 10 osd.1 245 pgid 76.5 coll 76.5_head
    -2> 2013-06-06 22:11:43.947446 7f69f8cda780 15 filestore(/var/lib/ceph/osd/ceph-1) collection_getattr /var/lib/ceph/osd/ceph-1/current/76.5_head 'info'
    -1> 2013-06-06 22:11:43.947468 7f69f8cda780 10 filestore(/var/lib/ceph/osd/ceph-1) collection_getattr /var/lib/ceph/osd/ceph-1/current/76.5_head 'info' = -61
     0> 2013-06-06 22:11:43.950057 7f69f8cda780 -1 *** Caught signal (Aborted) **
 in thread 7f69f8cda780

 ceph version 0.63-393-g08923eb (08923eb842a9768ff556939221e63b983724e9bf)
 1: ceph-osd() [0x7a825a]
 2: (()+0xfcb0) [0x7f69f86bdcb0]
 3: (gsignal()+0x35) [0x7f69f678b425]
 4: (abort()+0x17b) [0x7f69f678eb8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f69f70dd69d]
 6: (()+0xb5846) [0x7f69f70db846]
 7: (()+0xb5873) [0x7f69f70db873]
 8: (()+0xb596e) [0x7f69f70db96e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x85ab37]
 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x112) [0x6c1e82]
 11: (OSD::load_pgs()+0x14dd) [0x64cd6d]
 12: (OSD::init()+0xf85) [0x64f1c5]
 13: (main()+0x1cff) [0x5836cf]

several instances on current master. one test was

ubuntu@teuthology:/a/sage-2013-06-06_17:56:44-rados-wip-mon-testing-basic/32596$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: 08923eb842a9768ff556939221e63b983724e9bf
  install:
    ceph:
      sha1: 08923eb842a9768ff556939221e63b983724e9bf
  s3tests:
    branch: master
  workunit:
    sha1: 08923eb842a9768ff556939221e63b983724e9bf
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph-fuse: null
- workunit:
    clients:
      client.0:
      - rados/test.sh

looking at the store the xattr is not there:

root@plana24:/var/lib/ceph/osd/ceph-1/current/76.5_head# getfattr -d .
# file: .
user.cephos.collection_version=0sAwAAAA==
user.cephos.phash.contents=0sAQAAAAAAAAAAAAAAAAAAAAA=
user.cephos.seq=0sAQEQAAAAHRMAAAAAAAAAAAAABQAAAAA=

other jobs, also pg splitting:

ubuntu@teuthology:/a/sage-2013-06-06_17:56:44-rados-wip-mon-testing-basic/32578
ubuntu@teuthology:/a/sage-2013-06-06_17:56:44-rados-wip-mon-testing-basic/32584
ubuntu@teuthology:/a/sage-2013-06-06_17:56:44-rados-wip-mon-testing-basic/32590


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #5269: osd: EEXIST on mkcollResolved06/06/2013

Actions
Actions

Also available in: Atom PDF