Project

General

Profile

Actions

Bug #18351

closed

msg/DispatchQueue.h: 228: FAILED assert(mqueue.empty())

Added by Sage Weil over 7 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2016-12-24_01:53:28-rados:verify-wip-sage-testing2---basic-smithi/660358

   -58> 2016-12-24 02:33:47.162431 f853700 30 stack operator() calling event process
   -57> 2016-12-24 02:33:47.162454 f853700 10 Event(0xdbe3300 nevent=5000 time_id=7).process_events wait second 30 usec 0
   -56> 2016-12-24 02:33:47.162584 9501ac0 20 Event(0xdc106f0 nevent=5000 time_id=6).dispatch_event_external 0xffeffe7b0 pending 1
   -55> 2016-12-24 02:33:47.162625 9501ac0  2 Event(0xdc205e0 nevent=5000 time_id=5).wakeup
   -54> 2016-12-24 02:33:47.162705 9501ac0 20 Event(0xdc205e0 nevent=5000 time_id=5).dispatch_event_external 0xffeffe7b0 pending 1
   -53> 2016-12-24 02:33:47.162890 10855700 20 Event(0xdc205e0 nevent=5000 time_id=5).process_events event_wq process is 13 mask is 1
   -52> 2016-12-24 02:33:47.162960 10855700 20 Event(0xdc205e0 nevent=5000 time_id=5).process_events do 0xffeffe7b0
   -51> 2016-12-24 02:33:47.163174 9501ac0 10 stack drain end.
   -50> 2016-12-24 02:33:47.163202 9501ac0 10 -- 172.21.15.33:6810/11310 wait: done.
   -49> 2016-12-24 02:33:47.163252 9501ac0  1 -- 172.21.15.33:6810/11310 wait complete.
   -48> 2016-12-24 02:33:47.163372 9501ac0 20 -- 172.21.15.33:6809/11310 wait: stopping accepter thread
   -47> 2016-12-24 02:33:47.163427 9501ac0 10 accepter.stop accept listening on: 18
   -46> 2016-12-24 02:33:47.163498 21729700 10 accepter.entry poll returned oke: 1
   -45> 2016-12-24 02:33:47.163530 21729700 20 accepter.entry pfd.revents[0]=0
   -44> 2016-12-24 02:33:47.163557 21729700 20 accepter.entry pfd.revents[1]=1
   -43> 2016-12-24 02:33:47.163622 21729700 20 accepter.entry closing
   -42> 2016-12-24 02:33:47.163648 21729700 10 accepter.entry stopping
   -41> 2016-12-24 02:33:47.163769 9501ac0 15 accepter.stop signaled poll
   -40> 2016-12-24 02:33:47.163799 9501ac0  5 accepter.stop wait for thread to join.
   -39> 2016-12-24 02:33:47.163864 9501ac0 20 -- 172.21.15.33:6809/11310 wait: stopped accepter thread
   -38> 2016-12-24 02:33:47.164342 9501ac0 10 -- 172.21.15.33:6809/11310 wait: waiting for dispatch queue
   -37> 2016-12-24 02:33:47.164488 9501ac0 10 -- 172.21.15.33:6809/11310 wait: dispatch queue is stopped
   -36> 2016-12-24 02:33:47.164550 9501ac0 20 -- 172.21.15.33:6809/11310 wait: stopping reaper thread
   -35> 2016-12-24 02:33:47.164779 14792700 10 -- 172.21.15.33:6809/11310 reaper_entry done
   -34> 2016-12-24 02:33:47.165013 9501ac0 20 -- 172.21.15.33:6809/11310 wait: stopped reaper thread
   -33> 2016-12-24 02:33:47.165093 9501ac0 10 -- 172.21.15.33:6809/11310 wait: closing pipes
   -32> 2016-12-24 02:33:47.165176 9501ac0 10 -- 172.21.15.33:6809/11310 reaper
   -31> 2016-12-24 02:33:47.165255 9501ac0 10 -- 172.21.15.33:6809/11310 reaper done
   -30> 2016-12-24 02:33:47.165317 9501ac0 10 -- 172.21.15.33:6809/11310 wait: waiting for pipes  to close
   -29> 2016-12-24 02:33:47.165394 9501ac0 10 -- 172.21.15.33:6809/11310 wait: done.
   -28> 2016-12-24 02:33:47.165469 9501ac0  1 -- 172.21.15.33:6809/11310 shutdown complete.
   -27> 2016-12-24 02:33:47.166109 10855700 30 stack operator() calling event process
   -26> 2016-12-24 02:33:47.166150 10855700 10 Event(0xdc205e0 nevent=5000 time_id=5).process_events wait second 30 usec 0
   -25> 2016-12-24 02:33:47.166411 9501ac0 10 -- - wait: waiting for dispatch queue
   -24> 2016-12-24 02:33:47.166568 9501ac0 10 -- - wait: dispatch queue is stopped
   -23> 2016-12-24 02:33:47.166612 9501ac0  1 -- - shutdown_connections 
   -22> 2016-12-24 02:33:47.166654 9501ac0 10 stack drain started.
   -21> 2016-12-24 02:33:47.166700 9501ac0  2 Event(0xdbe3300 nevent=5000 time_id=7).wakeup
   -20> 2016-12-24 02:33:47.166896 f853700 20 Event(0xdbe3300 nevent=5000 time_id=7).process_events event_wq process is 7 mask is 1
   -19> 2016-12-24 02:33:47.166944 f853700 20 Event(0xdbe3300 nevent=5000 time_id=7).process_events do 0xffeffe7b0
   -18> 2016-12-24 02:33:47.166986 f853700 30 stack operator() calling event process
   -17> 2016-12-24 02:33:47.167009 f853700 10 Event(0xdbe3300 nevent=5000 time_id=7).process_events wait second 30 usec 0
   -16> 2016-12-24 02:33:47.167140 9501ac0 20 Event(0xdbe3300 nevent=5000 time_id=7).dispatch_event_external 0xffeffe7b0 pending 1
   -15> 2016-12-24 02:33:47.167210 9501ac0  2 Event(0xdc106f0 nevent=5000 time_id=6).wakeup
   -14> 2016-12-24 02:33:47.167385 10054700 20 Event(0xdc106f0 nevent=5000 time_id=6).process_events event_wq process is 10 mask is 1
   -13> 2016-12-24 02:33:47.167433 10054700 20 Event(0xdc106f0 nevent=5000 time_id=6).process_events do 0xffeffe7b0
   -12> 2016-12-24 02:33:47.167474 10054700 30 stack operator() calling event process
   -11> 2016-12-24 02:33:47.167500 10054700 10 Event(0xdc106f0 nevent=5000 time_id=6).process_events wait second 30 usec 0
   -10> 2016-12-24 02:33:47.167667 9501ac0 20 Event(0xdc106f0 nevent=5000 time_id=6).dispatch_event_external 0xffeffe7b0 pending 1
    -9> 2016-12-24 02:33:47.167735 9501ac0  2 Event(0xdc205e0 nevent=5000 time_id=5).wakeup
    -8> 2016-12-24 02:33:47.167880 9501ac0 20 Event(0xdc205e0 nevent=5000 time_id=5).dispatch_event_external 0xffeffe7b0 pending 1
    -7> 2016-12-24 02:33:47.168033 10855700 20 Event(0xdc205e0 nevent=5000 time_id=5).process_events event_wq process is 13 mask is 1
    -6> 2016-12-24 02:33:47.168103 10855700 20 Event(0xdc205e0 nevent=5000 time_id=5).process_events do 0xffeffe7b0
    -5> 2016-12-24 02:33:47.168344 9501ac0 10 stack drain end.
    -4> 2016-12-24 02:33:47.168386 9501ac0 10 -- - wait: done.
    -3> 2016-12-24 02:33:47.168423 9501ac0  1 -- - wait complete.
    -2> 2016-12-24 02:33:47.172255 10855700 30 stack operator() calling event process
    -1> 2016-12-24 02:33:47.172287 10855700 10 Event(0xdc205e0 nevent=5000 time_id=5).process_events wait second 30 usec 0
     0> 2016-12-24 02:33:47.261790 9501ac0 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6234-g78cf6e5/rpm/el7/BUILD/ceph-11.1.0-6234-g78cf6e5/src/msg/DispatchQueue.h: In function 'DispatchQueue::~DispatchQueue()' 
thread 9501ac0 time 2016-12-24 02:33:47.223837
/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6234-g78cf6e5/rpm/el7/BUILD/ceph-11.1.0-6234-g78cf6e5/src/msg/DispatchQueue.h: 228: FAILED assert(mqueue.empty())

 ceph version 11.1.0-6234-g78cf6e5 (78cf6e578116a05f1954a7fc0377b3dffe2072bf)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7e) [0xbc678e]
 2: (DispatchQueue::~DispatchQueue()+0x44) [0xc439a4]
 3: (SimpleMessenger::~SimpleMessenger()+0x140) [0xc412b0]
 4: (SimpleMessenger::~SimpleMessenger()+0x9) [0xc413c9]
 5: (main()+0x2de9) [0x49ffd9]
 6: (__libc_start_main()+0xf5) [0xd281b35]
 7: (()+0x413d39) [0x51bd39]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #1

Updated by Haomai Wang over 7 years ago

hmm, it looks like I don't fix the problem totally.... see http://tracker.ceph.com/issues/16472 https://github.com/ceph/ceph/pull/9930

Actions #2

Updated by Sage Weil about 7 years ago

  • Status changed from New to 12

/a/sage-2017-02-14_14:45:43-rados-wip-pg-split-interval---basic-smithi/815357

Actions #3

Updated by Sage Weil almost 7 years ago

2017-05-23T21:43:19.047 INFO:tasks.ceph.osd.3.smithi022.stderr:     0> 2017-05-23 21:43:18.506585 961e080 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-1527-ga0c7be3/rpm/el7/BUILD/ceph-12.0.2-1527-ga0c
7be3/src/msg/DispatchQueue.h: In function 'DispatchQueue::~DispatchQueue()' thread 961e080 time 2017-05-23 21:43:18.464111
2017-05-23T21:43:19.047 INFO:tasks.ceph.osd.3.smithi022.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-1527-ga0c7be3/rpm/el7/BUILD/ceph-12.0.2-1527-ga0c7be3/src/msg/DispatchQueue.h: 228: FAILED asse
rt(mqueue.empty())
2017-05-23T21:43:19.047 INFO:tasks.ceph.osd.3.smithi022.stderr:
2017-05-23T21:43:19.047 INFO:tasks.ceph.osd.3.smithi022.stderr: ceph version 12.0.2-1527-ga0c7be3 (a0c7be3db4f37d255d6c65e06fc3d7c19fb38c1e)
2017-05-23T21:43:19.047 INFO:tasks.ceph.osd.3.smithi022.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0xaf2ff0]
2017-05-23T21:43:19.047 INFO:tasks.ceph.osd.3.smithi022.stderr: 2: (DispatchQueue::~DispatchQueue()+0x44) [0xb73064]
2017-05-23T21:43:19.047 INFO:tasks.ceph.osd.3.smithi022.stderr: 3: (SimpleMessenger::~SimpleMessenger()+0x140) [0xb70970]
2017-05-23T21:43:19.047 INFO:tasks.ceph.osd.3.smithi022.stderr: 4: (SimpleMessenger::~SimpleMessenger()+0x9) [0xb70a89]
2017-05-23T21:43:19.061 INFO:tasks.ceph.osd.3.smithi022.stderr: 5: (main()+0x2fd8) [0x4d96a8]
2017-05-23T21:43:19.061 INFO:tasks.ceph.osd.3.smithi022.stderr: 6: (__libc_start_main()+0xf5) [0xd7b3b35]

/a/sage-2017-05-23_19:52:01-rados-wip-sage-testing2---basic-smithi/1221515
Actions #4

Updated by Sage Weil over 6 years ago

  • Project changed from Ceph to RADOS

/a/sage-2017-09-01_21:29:08-rados-wip-sage-testing-2017-09-01-1033-distro-basic-smithi/1587842

Actions #5

Updated by Sage Weil over 6 years ago

  • Priority changed from High to Normal
Actions #6

Updated by Sage Weil about 6 years ago

  • Status changed from 12 to Fix Under Review
Actions #7

Updated by Sage Weil about 6 years ago

see /a/sage-2018-02-07_00:31:37-rados-wip-sage3-testing-2018-02-06-1427-distro-basic-smithi/2161112 for latest instance. the wip-config branch seems good at triggering this (MonClient::get_monmap_and_config_privately()).

Actions #8

Updated by Greg Farnum about 6 years ago

  • Status changed from Fix Under Review to Resolved
Actions #9

Updated by Greg Farnum about 5 years ago

  • Project changed from RADOS to Messengers
Actions

Also available in: Atom PDF