Project

General

Profile

Bug #4538

os/FileStore.h: 191: FAILED assert(q.empty()) on shutdown

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

    -6> 2013-03-24 09:17:29.295705 93c2700 20 osd.5 18  kicking pg 0.8
    -5> 2013-03-24 09:17:29.295822 93c2700 30 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 lcod 10'125 mlcod 0'0 active+degraded] lock
    -4> 2013-03-24 09:17:29.296365 93c2700 10 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 lcod 10'125 mlcod 0'0 active+degraded] on_shutdown
    -3> 2013-03-24 09:17:29.296950 93c2700 10 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 lcod 10'125 mlcod 0'0 active+degraded] clear_primary_state
    -2> 2013-03-24 09:17:29.297526 93c2700 10 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 luod=0'0 lcod 10'125 mlcod 0'0 active+degraded] cancel_recovery
    -1> 2013-03-24 09:17:29.298100 93c2700 10 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 luod=0'0 lcod 10'125 mlcod 0'0 active+degraded] clear_recovery_state
     0> 2013-03-24 09:17:29.446610 93c2700 -1 os/FileStore.h: In function 'virtual FileStore::OpSequencer::~OpSequencer()' thread 93c2700 time 2013-03-24 09:17:29.299718
os/FileStore.h: 191: FAILED assert(q.empty())

 ceph version 0.59-478-g8befbca (8befbca77aa50a1188969892aabedaf11d8f8ce7)
 1: (FileStore::OpSequencer::~OpSequencer()+0xc3) [0x74b533]
 2: (std::tr1::_Sp_counted_base_impl<ObjectStore::Sequencer*, SharedPtrRegistry<pg_t, ObjectStore::Sequencer>::OnRemoval, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0xa0) [0x6703b0]
 3: (std::tr1::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()+0x49) [0x5dba19]
 4: (PG::~PG()+0xad) [0x6c201d]
 5: (ReplicatedPG::~ReplicatedPG()+0x9) [0x5ebe39]
 6: (OSD::shutdown()+0xfed) [0x61521d]
 7: (OSD::handle_signal(int)+0x118) [0x6157a8]
 8: (SignalHandler::entry()+0x1ac) [0x78a0cc]
 9: (()+0x7e9a) [0x503be9a]
 10: (clone()+0x6d) [0x6edb4bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

on job

ubuntu@teuthology:/a/sage-2013-03-24_08:29:36-fs-master-testing-basic/2422$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 06fb6a9f87bb1377a6549602fff230d4b352afe9
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7
    valgrind:
      mds:
      - --tool=memcheck
      mon:
      - --tool=memcheck
      - --leak-check=full
      - --show-reachable=yes
      osd:
      - --tool=memcheck
  ceph-fuse:
    client.0:
      valgrind:
      - --tool=memcheck
      - --leak-check=full
      - --show-reachable=yes
  s3tests:
    branch: master
  workunit:
    sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- install: null
- ceph:
    conf:
      client:
        debug client: 1/20
        debug ms: 0/10
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/dbench.sh

(and many others)

Associated revisions

Revision 91a8d93c (diff)
Added by Samuel Just about 11 years ago

OSD: flush pg osr on shutdown prior to put()

Fixes: #4538
Signed-off-by: Samuel Just <>
Reviewed-by: Sage Weil <>

History

#1 Updated by Sage Weil about 11 years ago

seems to happen more when valgrind is running.. probably changes the timing?

#2 Updated by Samuel Just about 11 years ago

  • Status changed from 12 to Resolved

Also available in: Atom PDF