Project

General

Profile

Bug #15784

msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared)

Added by Richard Hesse over 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
Start date:
05/09/2016
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
infernalis
Needs Doc:
No

Description

     0> 2016-05-09 04:14:22.742318 7f2555bcf700 -1 msg/simple/SimpleMessenger.cc: In function 'void SimpleMessenger::reaper()' thread 7f2555bcf700 time 2016-05-09 04:14:
22.727680
msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared)

 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f255d2e20b5]
 2: (SimpleMessenger::reaper()+0x883) [0x7f255d2bad93]
 3: (SimpleMessenger::reaper_entry()+0x150) [0x7f255d2baf40]
 4: (SimpleMessenger::ReaperThread::entry()+0xd) [0x7f255d2c2acd]
 5: (()+0x7dc5) [0x7f255b376dc5]
 6: (clone()+0x6d) [0x7f2559c1d28d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This appears to be a duplicate of #11381 which was marked as resolved a year ago. However, I'm running Infernalis (9.2.1), so it appears that either the original bug wasn't fixed or it was re-introduced with new changes.


Related issues

Duplicated by Ceph - Bug #17048: "SimpleMessenger.cc: 237: FAILED assert(!cleared)" in rbd-master-distro-basic-vps Duplicate 08/16/2016
Copied to Ceph - Backport #18378: kraken: msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) Resolved
Copied to Ceph - Backport #18379: jewel: msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) Resolved

History

#1 Updated by Richard Hesse over 1 year ago

Trying to upload the objdump output, but I'm getting a request entity too large error from Redmine. The maximum size is listed at 73.4MB, but the file is only 17MB. I can't see attachments so let me know if you are actually getting these or not.

#2 Updated by Haomai Wang over 1 year ago

yes, #11381 fix already in v9.2.1 commits. could you give more osd logs?

#3 Updated by Richard Hesse over 1 year ago

Do you need more (older) logs, or more detailed logs?

#4 Updated by Haomai Wang over 1 year ago

detail logs

#5 Updated by Greg Farnum over 1 year ago

For larger logs, you will have an easier time using ceph-post-file than trying to upload them to Redmine. :)

#6 Updated by Richard Hesse over 1 year ago

Thanks for the suggestion Greg. Here's the ID for the OSD log:

ceph-post-file: d4fb9bed-5754-410d-8402-960f2bc28c6e

Here's the ID for the objdump:

ceph-post-file: 136a4ae1-c4a9-4a9c-a5dc-1c6e9cfd3458

#7 Updated by Greg Farnum over 1 year ago

There is one clue in the sparse log, which is that immediately prior to the crash there's a Pipe receiving a message which gets "ABORTED". But there's not even enough detail to be sure if it's the Pipe getting reaped or not. :(

#8 Updated by Yuri Weinstein about 1 year ago

  • Related to Bug #17048: "SimpleMessenger.cc: 237: FAILED assert(!cleared)" in rbd-master-distro-basic-vps added

#9 Updated by Samuel Just about 1 year ago

  • Priority changed from Normal to Urgent

#10 Updated by Samuel Just about 1 year ago

  • Related to deleted (Bug #17048: "SimpleMessenger.cc: 237: FAILED assert(!cleared)" in rbd-master-distro-basic-vps)

#11 Updated by Samuel Just about 1 year ago

  • Duplicated by Bug #17048: "SimpleMessenger.cc: 237: FAILED assert(!cleared)" in rbd-master-distro-basic-vps added

#13 Updated by Samuel Just 11 months ago

  • Priority changed from Urgent to High
  • Needs Doc set to No

#14 Updated by Sage Weil 11 months ago

  • Subject changed from Assert in reaper to msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) on hammer

#15 Updated by Sage Weil 10 months ago

  • Subject changed from msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) on hammer to msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared)
  • Status changed from New to Verified
  • Priority changed from High to Urgent

/a/sage-2016-12-22_16:06:01-rados-wip-task-peer---basic-smithi/657852

just saw this on master.

#16 Updated by Sage Weil 10 months ago

  -161> 2016-12-22 17:31:45.460613 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept:  setting up session_security.
  -160> 2016-12-22 17:31:45.460733 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept new session
  -159> 2016-12-22 17:31:45.460846 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept success, connect_seq = 1, sending READY
  -158> 2016-12-22 17:31:45.460959 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept features 1152921504336314367
  -157> 2016-12-22 17:31:45.461253 3d4f6700 10 -- 172.21.15.14:6804/20738 queue_reap 0x3e23cd10
  -156> 2016-12-22 17:31:45.461882 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopping accepter thread
  -155> 2016-12-22 17:31:45.462994 9506ac0 10 accepter.stop accept listening on: 15
  -154> 2016-12-22 17:31:45.463984 231a7700 10 accepter.entry poll returned oke: 1
  -153> 2016-12-22 17:31:45.464038 231a7700 20 accepter.entry pfd.revents[0]=0
  -152> 2016-12-22 17:31:45.464069 231a7700 20 accepter.entry pfd.revents[1]=1
  -151> 2016-12-22 17:31:45.464558 231a7700 20 accepter.entry closing
  -150> 2016-12-22 17:31:45.465498 231a7700 10 accepter.entry stopping
  -149> 2016-12-22 17:31:45.466198 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).register_pipe
  -148> 2016-12-22 17:31:45.466363 239a8700  1 osd.4 0 ms_handle_reset con 0x3c0b8570 session 0x3d52fb10
  -147> 2016-12-22 17:31:45.466876 9506ac0 15 accepter.stop signaled poll
  -146> 2016-12-22 17:31:45.467894 9506ac0  5 accepter.stop wait for thread to join.
  -145> 2016-12-22 17:31:45.468595 1be1c700 10 -- 172.21.15.14:6804/20738 reaper
  -144> 2016-12-22 17:31:45.468681 1be1c700 10 -- 172.21.15.14:6804/20738 reaper reaping pipe 0x3e23cd10 172.21.15.35:0/146098963
  -143> 2016-12-22 17:31:45.468789 1be1c700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e23cd10 sd=46 :6804 s=4 pgs=6 cs=1 l=1 c=0x3e23e030).discard_queue
  -142> 2016-12-22 17:31:45.468918 1be1c700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e23cd10 sd=46 :6804 s=4 pgs=6 cs=1 l=1 c=0x3e23e030).unregister_pipe - not registered
  -141> 2016-12-22 17:31:45.469901 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopped accepter thread
  -140> 2016-12-22 17:31:45.470629 1be1c700 20 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e23cd10 sd=46 :6804 s=4 pgs=6 cs=1 l=1 c=0x3e23e030).join
  -139> 2016-12-22 17:31:45.470788 37f54700  5 -- 172.21.15.14:6805/20738 mark_down_all 172.21.15.35:6805/23505 0x3bef85e0
  -138> 2016-12-22 17:31:45.470883 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6805/23505 pipe(0x3bef85e0 sd=65 :36658 s=3 pgs=5 cs=1 l=0 c=0x3bef9900).unregister_pipe - not registered
  -137> 2016-12-22 17:31:45.470989 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6805/23505 pipe(0x3bef85e0 sd=65 :36658 s=3 pgs=5 cs=1 l=0 c=0x3bef9900).stop
  -136> 2016-12-22 17:31:45.471742 3d4f6700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e23cd10 sd=46 :6804 s=4 pgs=6 cs=1 l=1 c=0x3e23e030).writer done
  -135> 2016-12-22 17:31:45.472691 9506ac0 10 -- 172.21.15.14:6804/20738 wait: waiting for dispatch queue
  -134> 2016-12-22 17:31:45.473506 3d0f2700 20 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6801/20739 pipe(0x3c0b7250 sd=70 :54058 s=4 pgs=3 cs=1 l=0 c=0x3c0b8570).writer finishing
  -133> 2016-12-22 17:31:45.473890 1be1c700 10 -- 172.21.15.14:6804/20738 reaper reaped pipe 0x3e23cd10 172.21.15.35:0/146098963
  -132> 2016-12-22 17:31:45.474698 1be1c700 10 -- 172.21.15.14:6804/20738 reaper deleted pipe 0x3e23cd10
  -131> 2016-12-22 17:31:45.474768 1be1c700 10 -- 172.21.15.14:6804/20738 reaper done
  -130> 2016-12-22 17:31:45.474860 3dbf9700 10 -- 172.21.15.14:6805/20738 queue_reap 0x3c0b7250
  -129> 2016-12-22 17:31:45.474934 3d0f2700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6801/20739 pipe(0x3c0b7250 sd=70 :54058 s=4 pgs=3 cs=1 l=0 c=0x3c0b8570).writer done
  -128> 2016-12-22 17:31:45.475796 239a8700  1 osd.4 0 ms_handle_reset con 0x3bef9900 session 0x3d555820
  -127> 2016-12-22 17:31:45.475926 3cbed700 20 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6805/23505 pipe(0x3bef85e0 sd=65 :36658 s=4 pgs=5 cs=1 l=0 c=0x3bef9900).writer finishing
  -126> 2016-12-22 17:31:45.476246 3defc700 10 -- 172.21.15.14:6805/20738 queue_reap 0x3bef85e0
  -125> 2016-12-22 17:31:45.477079 9506ac0 10 -- 172.21.15.14:6804/20738 wait: dispatch queue is stopped
  -124> 2016-12-22 17:31:45.477965 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopping reaper thread
  -123> 2016-12-22 17:31:45.478721 37f54700  5 -- 172.21.15.14:6805/20738 mark_down_all 172.21.15.35:6809/23500 0x3bef5590
  -122> 2016-12-22 17:31:45.478811 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6809/23500 pipe(0x3bef5590 sd=66 :45406 s=3 pgs=7 cs=1 l=0 c=0x3bef68b0).unregister_pipe - not registered
  -121> 2016-12-22 17:31:45.478916 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6809/23500 pipe(0x3bef5590 sd=66 :45406 s=3 pgs=7 cs=1 l=0 c=0x3bef68b0).stop
  -120> 2016-12-22 17:31:45.479846 1be1c700 10 -- 172.21.15.14:6804/20738 reaper_entry done
  -119> 2016-12-22 17:31:45.480513 3cbed700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6805/23505 pipe(0x3bef85e0 sd=65 :36658 s=4 pgs=5 cs=1 l=0 c=0x3bef9900).writer done
  -118> 2016-12-22 17:31:45.480763 239a8700  1 osd.4 0 ms_handle_reset con 0x3bef68b0 session 0x3d5aba40
  -117> 2016-12-22 17:31:45.481224 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopped reaper thread
  -116> 2016-12-22 17:31:45.482137 9506ac0 10 -- 172.21.15.14:6804/20738 wait: closing pipes
  -115> 2016-12-22 17:31:45.482850 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).unregister_pipe
  -114> 2016-12-22 17:31:45.483421 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).stop
  -113> 2016-12-22 17:31:45.484164 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept fault after register
  -112> 2016-12-22 17:31:45.485636 9506ac0 10 -- 172.21.15.14:6804/20738 reaper
  -111> 2016-12-22 17:31:45.485700 9506ac0 10 -- 172.21.15.14:6804/20738 reaper done
  -110> 2016-12-22 17:31:45.486103 9506ac0 10 -- 172.21.15.14:6804/20738 wait: waiting for pipes 0x3e2a5c20 to close
  -109> 2016-12-22 17:31:45.487146 37353700 10 -- 172.21.15.14:6804/20738 queue_reap 0x3e2a5c20
  -108> 2016-12-22 17:31:45.487658 9506ac0 10 -- 172.21.15.14:6804/20738 reaper
  -107> 2016-12-22 17:31:45.487722 9506ac0 10 -- 172.21.15.14:6804/20738 reaper reaping pipe 0x3e2a5c20 172.21.15.35:0/146098963
  -106> 2016-12-22 17:31:45.487816 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).discard_queue
  -105> 2016-12-22 17:31:45.494742 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).reader done
  -104> 2016-12-22 17:31:45.500342 3caec700 20 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6809/23500 pipe(0x3bef5590 sd=66 :45406 s=4 pgs=7 cs=1 l=0 c=0x3bef68b0).writer finishing
  -103> 2016-12-22 17:31:45.500518 3caec700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6809/23500 pipe(0x3bef5590 sd=66 :45406 s=4 pgs=7 cs=1 l=0 c=0x3bef68b0).writer done
  -102> 2016-12-22 17:31:45.500770 37f54700  5 -- 172.21.15.14:6805/20738 mark_down_all 172.21.15.14:6809/20737 0x3d504e90
  -101> 2016-12-22 17:31:45.500962 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6809/20737 pipe(0x3d504e90 sd=71 :6805 s=3 pgs=3 cs=1 l=0 c=0x3bd1c4f0).unregister_pipe - not registered
  -100> 2016-12-22 17:31:45.501083 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6809/20737 pipe(0x3d504e90 sd=71 :6805 s=3 pgs=3 cs=1 l=0 c=0x3bd1c4f0).stop
   -99> 2016-12-22 17:31:45.501474 3dcfa700 10 -- 172.21.15.14:6805/20738 queue_reap 0x3bef5590
   -98> 2016-12-22 17:31:45.501601 3c3e5700 20 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6809/20737 pipe(0x3d504e90 sd=71 :6805 s=4 pgs=3 cs=1 l=0 c=0x3bd1c4f0).writer finishing
   -97> 2016-12-22 17:31:45.501760 3c3e5700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6809/20737 pipe(0x3d504e90 sd=71 :6805 s=4 pgs=3 cs=1 l=0 c=0x3bd1c4f0).writer done
   -96> 2016-12-22 17:31:45.501914 239a8700  1 osd.4 0 ms_handle_reset con 0x3bd1c4f0 session 0x3d534410
   -95> 2016-12-22 17:31:45.506337 37f54700  5 -- 172.21.15.14:6805/20738 mark_down_all 172.21.15.35:6801/23511 0x3befb630
   -94> 2016-12-22 17:31:45.506436 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6801/23511 pipe(0x3befb630 sd=67 :47804 s=3 pgs=3 cs=1 l=0 c=0x3befc950).unregister_pipe - not registered
   -93> 2016-12-22 17:31:45.506551 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6801/23511 pipe(0x3befb630 sd=67 :47804 s=3 pgs=3 cs=1 l=0 c=0x3befc950).stop
   -92> 2016-12-22 17:31:45.527589 9506ac0 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6151-ge1781dd/rpm/el7/BUILD/ceph-11.1.0-6151-ge1781dd/src/msg/simple/SimpleMessenger.cc: In function 'void SimpleMessenger::reaper()' thread 9506ac0 time 2016-12-22 17:31:45.488264

relevant bit of log. the wait shutdown sequence failed to clear_pipe() the way that mark_down, mark_down_all, and fault() normally do.

#17 Updated by Sage Weil 10 months ago

  • Status changed from Verified to Testing

#18 Updated by Sage Weil 10 months ago

  • Status changed from Testing to Pending Backport
  • Backport set to kraken,jewel

#20 Updated by Nathan Cutler 10 months ago

  • Copied to Backport #18378: kraken: msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) added

#21 Updated by Nathan Cutler 10 months ago

  • Copied to Backport #18379: jewel: msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) added

#22 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF