Bug #4538
os/FileStore.h: 191: FAILED assert(q.empty()) on shutdown
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-6> 2013-03-24 09:17:29.295705 93c2700 20 osd.5 18 kicking pg 0.8 -5> 2013-03-24 09:17:29.295822 93c2700 30 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 lcod 10'125 mlcod 0'0 active+degraded] lock -4> 2013-03-24 09:17:29.296365 93c2700 10 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 lcod 10'125 mlcod 0'0 active+degraded] on_shutdown -3> 2013-03-24 09:17:29.296950 93c2700 10 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 lcod 10'125 mlcod 0'0 active+degraded] clear_primary_state -2> 2013-03-24 09:17:29.297526 93c2700 10 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 luod=0'0 lcod 10'125 mlcod 0'0 active+degraded] cancel_recovery -1> 2013-03-24 09:17:29.298100 93c2700 10 osd.5 pg_epoch: 18 pg[0.8( v 10'126 (0'0,10'126] local-les=18 n=39 ec=1 les/c 18/18 17/17/17) [5] r=0 lpr=17 luod=0'0 lcod 10'125 mlcod 0'0 active+degraded] clear_recovery_state 0> 2013-03-24 09:17:29.446610 93c2700 -1 os/FileStore.h: In function 'virtual FileStore::OpSequencer::~OpSequencer()' thread 93c2700 time 2013-03-24 09:17:29.299718 os/FileStore.h: 191: FAILED assert(q.empty()) ceph version 0.59-478-g8befbca (8befbca77aa50a1188969892aabedaf11d8f8ce7) 1: (FileStore::OpSequencer::~OpSequencer()+0xc3) [0x74b533] 2: (std::tr1::_Sp_counted_base_impl<ObjectStore::Sequencer*, SharedPtrRegistry<pg_t, ObjectStore::Sequencer>::OnRemoval, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0xa0) [0x6703b0] 3: (std::tr1::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()+0x49) [0x5dba19] 4: (PG::~PG()+0xad) [0x6c201d] 5: (ReplicatedPG::~ReplicatedPG()+0x9) [0x5ebe39] 6: (OSD::shutdown()+0xfed) [0x61521d] 7: (OSD::handle_signal(int)+0x118) [0x6157a8] 8: (SignalHandler::entry()+0x1ac) [0x78a0cc] 9: (()+0x7e9a) [0x503be9a] 10: (clone()+0x6d) [0x6edb4bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
on job
ubuntu@teuthology:/a/sage-2013-03-24_08:29:36-fs-master-testing-basic/2422$ cat orig.config.yaml kernel: kdb: true sha1: 06fb6a9f87bb1377a6549602fff230d4b352afe9 machine_type: plana nuke-on-error: true overrides: ceph: conf: mon: debug mon: 20 debug ms: 20 debug paxos: 20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7 valgrind: mds: - --tool=memcheck mon: - --tool=memcheck - --leak-check=full - --show-reachable=yes osd: - --tool=memcheck ceph-fuse: client.0: valgrind: - --tool=memcheck - --leak-check=full - --show-reachable=yes s3tests: branch: master workunit: sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - install: null - ceph: conf: client: debug client: 1/20 debug ms: 0/10 - ceph-fuse: null - workunit: clients: all: - suites/dbench.sh
(and many others)
Associated revisions
OSD: flush pg osr on shutdown prior to put()
Fixes: #4538
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
History
#1 Updated by Sage Weil about 11 years ago
seems to happen more when valgrind is running.. probably changes the timing?
#2 Updated by Samuel Just about 11 years ago
- Status changed from 12 to Resolved