Project

General

Profile

Actions

Bug #4662

closed

osd/OSD.h: 809: FAILED assert(peering_queue.empty()) on shutdown

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -28> 2013-04-04 23:49:09.256969 9340700 -1 osd/OSD.h: In function 'virtual void OSD::PeeringWQ::_clear()' thread 9340700 time 2013-04-04 23:49:0
9.132571
osd/OSD.h: 809: FAILED assert(peering_queue.empty())

 ceph version 0.60-402-g3c0debf (3c0debf99d51a8ec1cbd76d96c436674d56dfc6e)
 1: ceph-osd() [0x64f27c]
 2: (ThreadPool::stop(bool)+0x1ed) [0x83409d]
 3: (OSD::shutdown()+0x6cc) [0x61670c]
 4: (OSD::handle_signal(int)+0x118) [0x6176e8]
 5: (SignalHandler::entry()+0x1ac) [0x7900ac]
 6: (()+0x7e9a) [0x503be9a]
 7: (clone()+0x6d) [0x6c71cbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was
ubuntu@teuthology:/a/teuthology-2013-04-04_19:47:50-fs-next-testing-basic/9305$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 85b6aabe740024f9f6aaa54afc3195940e5fa12c
nuke-on-error: true
overrides:
  ceph:
    conf:
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 3c0debf99d51a8ec1cbd76d96c436674d56dfc6e
    valgrind:
      mds:
      - --tool=memcheck
      mon:
      - --tool=memcheck
      - --leak-check=full
      - --show-reachable=yes
      osd:
      - --tool=memcheck
  ceph-fuse:
    client.0:
      valgrind:
      - --tool=memcheck
      - --leak-check=full
      - --show-reachable=yes
  s3tests:
    branch: next
  workunit:
    sha1: 3c0debf99d51a8ec1cbd76d96c436674d56dfc6e
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/fsstress.sh

Actions #1

Updated by Ian Colle about 11 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Samuel Just about 11 years ago

-43> 2013-04-11 15:36:50.728192 ef4b700 10 filestore hoid: 16ef7597/infos/head//-1 not skipping op, *spos 13058.0.0
-42> 2013-04-11 15:36:50.728292 ef4b700 10 filestore > header.spos 0.0.0
-41> 2013-04-11 15:36:50.729870 ef4b700 10 journal op_apply_finish 13058 open_ops 1 -> 0, max_applied_seq 13057 -> 13058
-40> 2013-04-11 15:36:50.730058 ef4b700 10 filestore(/var/lib/ceph/osd/ceph-3) _do_op 0x1e65ff50 seq 13058 r = 0, finisher 0x1c427380 0
-39> 2013-04-11 15:36:50.730545 ef4b700 10 filestore(/var/lib/ceph/osd/ceph-3) _finish_op 0x1e65ff50 seq 13058 osr(1.4 0x1a80ffb0)/0x1a80ffb0
-38> 2013-04-11 15:36:50.732396 9340700 20 osd.3 331 kicking pg 0.2
-37> 2013-04-11 15:36:50.732653 9340700 30 osd.3 pg_epoch: 331 pg[0.2( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] lock
-36> 2013-04-11 15:36:50.733371 9340700 10 osd.3 pg_epoch: 331 pg[0.2( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] on_shutdown
-35> 2013-04-11 15:36:50.733745 9340700 10 osd.3 pg_epoch: 331 pg[0.2( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] clear_primary_state
-34> 2013-04-11 15:36:50.734085 9340700 10 osd.3 pg_epoch: 331 pg[0.2( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] cancel_recovery
-33> 2013-04-11 15:36:50.734395 9340700 10 osd.3 pg_epoch: 331 pg[0.2( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] clear_recovery_state
-32> 2013-04-11 15:36:50.735022 9340700 20 osd.3 331 kicking pg 2.0
-31> 2013-04-11 15:36:50.735161 9340700 30 osd.3 pg_epoch: 331 pg[2.0( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] lock
-30> 2013-04-11 15:36:50.735467 9340700 10 osd.3 pg_epoch: 331 pg[2.0( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] on_shutdown
-29> 2013-04-11 15:36:50.735814 9340700 10 osd.3 pg_epoch: 331 pg[2.0( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] clear_primary_state
-28> 2013-04-11 15:36:50.736141 9340700 10 osd.3 pg_epoch: 331 pg[2.0( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] cancel_recovery
-27> 2013-04-11 15:36:50.736445 9340700 10 osd.3 pg_epoch: 331 pg[2.0( empty local-les=331 n=0 ec=1 les/c 331/331 329/329/329) [3] r=0 lpr=329 mlcod 0'0 active+degraded] clear_recovery_state
-26> 2013-04-11 15:36:50.736958 9340700 20 osd.3 331 kicking pg 2.3
-25> 2013-04-11 15:36:50.737087 9340700 30 osd.3 pg_epoch: 331 pg[2.3( empty local-les=331 n=0 ec=1 les/c 331/331 330/330/3) [3] r=0 lpr=330 mlcod 0'0 active+degraded] lock
-24> 2013-04-11 15:36:50.737390 9340700 10 osd.3 pg_epoch: 331 pg[2.3( empty local-les=331 n=0 ec=1 les/c 331/331 330/330/3) [3] r=0 lpr=330 mlcod 0'0 active+degraded] on_shutdown
-23> 2013-04-11 15:36:50.737734 9340700 10 osd.3 pg_epoch: 331 pg[2.3( empty local-les=331 n=0 ec=1 les/c 331/331 330/330/3) [3] r=0 lpr=330 mlcod 0'0 active+degraded] clear_primary_state
-22> 2013-04-11 15:36:50.738065 9340700 10 osd.3 pg_epoch: 331 pg[2.3( empty local-les=331 n=0 ec=1 les/c 331/331 330/330/3) [3] r=0 lpr=330 mlcod 0'0 active+degraded] cancel_recovery
-21> 2013-04-11 15:36:50.738368 9340700 10 osd.3 pg_epoch: 331 pg[2.3( empty local-les=331 n=0 ec=1 les/c 331/331 330/330/3) [3] r=0 lpr=330 mlcod 0'0 active+degraded] clear_recovery_state
-20> 2013-04-11 15:36:50.774850 f74c700 10 filestore(/var/lib/ceph/osd/ceph-3) _finish_op 0x1a36f9c0 seq 13056 osr(2.3 0x1a8754b0)/0x1a8754b0
-19> 2013-04-11 15:36:50.842047 9340700 -1 common/Mutex.cc: In function 'Mutex::~Mutex()' thread 9340700 time 2013-04-11 15:36:50.741266
common/Mutex.cc: 71: FAILED assert(nlock == 0)

ceph version 0.60-465-gd777b8e (d777b8e66b2e950266e52589c129b00f77b8afc0)
1: ceph-osd() [0x8084f6]
2: (FileStore::OpSequencer::~OpSequencer()+0x2e) [0x750f1e]
3: (std::tr1::_Sp_counted_base_impl<ObjectStore::Sequencer*, SharedPtrRegistry<pg_t, ObjectStore::Sequencer>::OnRemoval, (_gnu_cxx::_Lock_policy)2>::_M_dispose()+0xa0) [0x6744a0]
4: (std::tr1::
_shared_count<(_gnu_cxx::_Lock_policy)2>::~_shared_count()+0x49) [0x5ddc79]
5: (PG::~PG+0xad) [0x6cec2d]
6: (ReplicatedPG::~ReplicatedPG()+0x9) [0x5ee509]
7: (OSD::shutdown()+0x10cc) [0x61712c]
8: (OSD::handle_signal(int)+0x118) [0x617708]
9: (SignalHandler::entry()+0x1ac) [0x78f4cc]
10: (()+0x7e9a) [0x503be9a]
11: (clone()+0x6d) [0x6c71cbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

18> 2013-04-11 15:36:50.845403 1fe84700 20 - 10.214.133.136:6802/5357 >> 10.214.133.136:6808/5360 pipe(0x716c270 sd=30 :46925 s=1 pgs=13 cs=8 l=0).connect read peer addr 10.214.133.136:6808/5360 on socket 30
17> 2013-04-11 15:36:50.848422 1fe84700 20 - 10.214.133.136:6802/5357 >> 10.214.133.136:6808/5360 pipe(0x716c270 sd=30 :46925 s=1 pgs=13 cs=8 l=0).connect peer addr for me is 10.214.133.136:46925/0
16> 2013-04-11 15:36:50.851128 1fe84700 10 - 10.214.133.136:6802/5357 >> 10.214.133.136:6808/5360 pipe(0x716c270 sd=30 :46925 s=1 pgs=13 cs=8 l=0).connect sent my addr 10.214.133.136:6802/5357
-15> 2013-04-11 15:36:50.853496 1fe84700 10 osd.3 331 OSD::ms_get_authorizer type=osd

Actions #3

Updated by Sage Weil about 11 years ago

ubuntu@teuthology:/a/teuthology-2013-04-17_01:00:51-rgw-master-testing-basic/14226

Actions #4

Updated by Samuel Just about 11 years ago

  • Status changed from New to Resolved

481c532ff361b21e044621ac13c8f00ebfb1b3dc

Actions

Also available in: Atom PDF