Bug #15784
closedmsg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared)
0%
Description
0> 2016-05-09 04:14:22.742318 7f2555bcf700 -1 msg/simple/SimpleMessenger.cc: In function 'void SimpleMessenger::reaper()' thread 7f2555bcf700 time 2016-05-09 04:14: 22.727680 msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f255d2e20b5] 2: (SimpleMessenger::reaper()+0x883) [0x7f255d2bad93] 3: (SimpleMessenger::reaper_entry()+0x150) [0x7f255d2baf40] 4: (SimpleMessenger::ReaperThread::entry()+0xd) [0x7f255d2c2acd] 5: (()+0x7dc5) [0x7f255b376dc5] 6: (clone()+0x6d) [0x7f2559c1d28d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
This appears to be a duplicate of #11381 which was marked as resolved a year ago. However, I'm running Infernalis (9.2.1), so it appears that either the original bug wasn't fixed or it was re-introduced with new changes.
Updated by Richard Hesse almost 8 years ago
Trying to upload the objdump output, but I'm getting a request entity too large error from Redmine. The maximum size is listed at 73.4MB, but the file is only 17MB. I can't see attachments so let me know if you are actually getting these or not.
Updated by Haomai Wang almost 8 years ago
yes, #11381 fix already in v9.2.1 commits. could you give more osd logs?
Updated by Richard Hesse almost 8 years ago
Do you need more (older) logs, or more detailed logs?
Updated by Greg Farnum almost 8 years ago
For larger logs, you will have an easier time using ceph-post-file than trying to upload them to Redmine. :)
Updated by Richard Hesse almost 8 years ago
Thanks for the suggestion Greg. Here's the ID for the OSD log:
ceph-post-file: d4fb9bed-5754-410d-8402-960f2bc28c6e
Here's the ID for the objdump:
ceph-post-file: 136a4ae1-c4a9-4a9c-a5dc-1c6e9cfd3458
Updated by Greg Farnum almost 8 years ago
There is one clue in the sparse log, which is that immediately prior to the crash there's a Pipe receiving a message which gets "ABORTED". But there's not even enough detail to be sure if it's the Pipe getting reaped or not. :(
Updated by Yuri Weinstein over 7 years ago
- Related to Bug #17048: "SimpleMessenger.cc: 237: FAILED assert(!cleared)" in rbd-master-distro-basic-vps added
Updated by Samuel Just over 7 years ago
- Related to deleted (Bug #17048: "SimpleMessenger.cc: 237: FAILED assert(!cleared)" in rbd-master-distro-basic-vps)
Updated by Samuel Just over 7 years ago
- Has duplicate Bug #17048: "SimpleMessenger.cc: 237: FAILED assert(!cleared)" in rbd-master-distro-basic-vps added
Updated by Samuel Just over 7 years ago
See also This is QE validation oh hammer 0.94.8
sha1 08277b7bc7c0e533c3fd56a0040dc0ddc74637d6
Run: http://pulpito.ceph.com/yuriw-2016-08-16_16:29:58-rbd-master-distro-basic-vps/
Job: 368552
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2016-08-16_16:29:58-rbd-master-distro-basic-vps/368552/teuthology.log
Updated by Sage Weil over 7 years ago
- Subject changed from Assert in reaper to msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) on hammer
Updated by Sage Weil over 7 years ago
- Subject changed from msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) on hammer to msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared)
- Status changed from New to 12
- Priority changed from High to Urgent
/a/sage-2016-12-22_16:06:01-rados-wip-task-peer---basic-smithi/657852
just saw this on master.
Updated by Sage Weil over 7 years ago
-161> 2016-12-22 17:31:45.460613 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept: setting up session_security. -160> 2016-12-22 17:31:45.460733 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept new session -159> 2016-12-22 17:31:45.460846 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept success, connect_seq = 1, sending READY -158> 2016-12-22 17:31:45.460959 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept features 1152921504336314367 -157> 2016-12-22 17:31:45.461253 3d4f6700 10 -- 172.21.15.14:6804/20738 queue_reap 0x3e23cd10 -156> 2016-12-22 17:31:45.461882 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopping accepter thread -155> 2016-12-22 17:31:45.462994 9506ac0 10 accepter.stop accept listening on: 15 -154> 2016-12-22 17:31:45.463984 231a7700 10 accepter.entry poll returned oke: 1 -153> 2016-12-22 17:31:45.464038 231a7700 20 accepter.entry pfd.revents[0]=0 -152> 2016-12-22 17:31:45.464069 231a7700 20 accepter.entry pfd.revents[1]=1 -151> 2016-12-22 17:31:45.464558 231a7700 20 accepter.entry closing -150> 2016-12-22 17:31:45.465498 231a7700 10 accepter.entry stopping -149> 2016-12-22 17:31:45.466198 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).register_pipe -148> 2016-12-22 17:31:45.466363 239a8700 1 osd.4 0 ms_handle_reset con 0x3c0b8570 session 0x3d52fb10 -147> 2016-12-22 17:31:45.466876 9506ac0 15 accepter.stop signaled poll -146> 2016-12-22 17:31:45.467894 9506ac0 5 accepter.stop wait for thread to join. -145> 2016-12-22 17:31:45.468595 1be1c700 10 -- 172.21.15.14:6804/20738 reaper -144> 2016-12-22 17:31:45.468681 1be1c700 10 -- 172.21.15.14:6804/20738 reaper reaping pipe 0x3e23cd10 172.21.15.35:0/146098963 -143> 2016-12-22 17:31:45.468789 1be1c700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e23cd10 sd=46 :6804 s=4 pgs=6 cs=1 l=1 c=0x3e23e030).discard_queue -142> 2016-12-22 17:31:45.468918 1be1c700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e23cd10 sd=46 :6804 s=4 pgs=6 cs=1 l=1 c=0x3e23e030).unregister_pipe - not registered -141> 2016-12-22 17:31:45.469901 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopped accepter thread -140> 2016-12-22 17:31:45.470629 1be1c700 20 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e23cd10 sd=46 :6804 s=4 pgs=6 cs=1 l=1 c=0x3e23e030).join -139> 2016-12-22 17:31:45.470788 37f54700 5 -- 172.21.15.14:6805/20738 mark_down_all 172.21.15.35:6805/23505 0x3bef85e0 -138> 2016-12-22 17:31:45.470883 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6805/23505 pipe(0x3bef85e0 sd=65 :36658 s=3 pgs=5 cs=1 l=0 c=0x3bef9900).unregister_pipe - not registered -137> 2016-12-22 17:31:45.470989 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6805/23505 pipe(0x3bef85e0 sd=65 :36658 s=3 pgs=5 cs=1 l=0 c=0x3bef9900).stop -136> 2016-12-22 17:31:45.471742 3d4f6700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e23cd10 sd=46 :6804 s=4 pgs=6 cs=1 l=1 c=0x3e23e030).writer done -135> 2016-12-22 17:31:45.472691 9506ac0 10 -- 172.21.15.14:6804/20738 wait: waiting for dispatch queue -134> 2016-12-22 17:31:45.473506 3d0f2700 20 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6801/20739 pipe(0x3c0b7250 sd=70 :54058 s=4 pgs=3 cs=1 l=0 c=0x3c0b8570).writer finishing -133> 2016-12-22 17:31:45.473890 1be1c700 10 -- 172.21.15.14:6804/20738 reaper reaped pipe 0x3e23cd10 172.21.15.35:0/146098963 -132> 2016-12-22 17:31:45.474698 1be1c700 10 -- 172.21.15.14:6804/20738 reaper deleted pipe 0x3e23cd10 -131> 2016-12-22 17:31:45.474768 1be1c700 10 -- 172.21.15.14:6804/20738 reaper done -130> 2016-12-22 17:31:45.474860 3dbf9700 10 -- 172.21.15.14:6805/20738 queue_reap 0x3c0b7250 -129> 2016-12-22 17:31:45.474934 3d0f2700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6801/20739 pipe(0x3c0b7250 sd=70 :54058 s=4 pgs=3 cs=1 l=0 c=0x3c0b8570).writer done -128> 2016-12-22 17:31:45.475796 239a8700 1 osd.4 0 ms_handle_reset con 0x3bef9900 session 0x3d555820 -127> 2016-12-22 17:31:45.475926 3cbed700 20 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6805/23505 pipe(0x3bef85e0 sd=65 :36658 s=4 pgs=5 cs=1 l=0 c=0x3bef9900).writer finishing -126> 2016-12-22 17:31:45.476246 3defc700 10 -- 172.21.15.14:6805/20738 queue_reap 0x3bef85e0 -125> 2016-12-22 17:31:45.477079 9506ac0 10 -- 172.21.15.14:6804/20738 wait: dispatch queue is stopped -124> 2016-12-22 17:31:45.477965 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopping reaper thread -123> 2016-12-22 17:31:45.478721 37f54700 5 -- 172.21.15.14:6805/20738 mark_down_all 172.21.15.35:6809/23500 0x3bef5590 -122> 2016-12-22 17:31:45.478811 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6809/23500 pipe(0x3bef5590 sd=66 :45406 s=3 pgs=7 cs=1 l=0 c=0x3bef68b0).unregister_pipe - not registered -121> 2016-12-22 17:31:45.478916 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6809/23500 pipe(0x3bef5590 sd=66 :45406 s=3 pgs=7 cs=1 l=0 c=0x3bef68b0).stop -120> 2016-12-22 17:31:45.479846 1be1c700 10 -- 172.21.15.14:6804/20738 reaper_entry done -119> 2016-12-22 17:31:45.480513 3cbed700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6805/23505 pipe(0x3bef85e0 sd=65 :36658 s=4 pgs=5 cs=1 l=0 c=0x3bef9900).writer done -118> 2016-12-22 17:31:45.480763 239a8700 1 osd.4 0 ms_handle_reset con 0x3bef68b0 session 0x3d5aba40 -117> 2016-12-22 17:31:45.481224 9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopped reaper thread -116> 2016-12-22 17:31:45.482137 9506ac0 10 -- 172.21.15.14:6804/20738 wait: closing pipes -115> 2016-12-22 17:31:45.482850 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).unregister_pipe -114> 2016-12-22 17:31:45.483421 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).stop -113> 2016-12-22 17:31:45.484164 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept fault after register -112> 2016-12-22 17:31:45.485636 9506ac0 10 -- 172.21.15.14:6804/20738 reaper -111> 2016-12-22 17:31:45.485700 9506ac0 10 -- 172.21.15.14:6804/20738 reaper done -110> 2016-12-22 17:31:45.486103 9506ac0 10 -- 172.21.15.14:6804/20738 wait: waiting for pipes 0x3e2a5c20 to close -109> 2016-12-22 17:31:45.487146 37353700 10 -- 172.21.15.14:6804/20738 queue_reap 0x3e2a5c20 -108> 2016-12-22 17:31:45.487658 9506ac0 10 -- 172.21.15.14:6804/20738 reaper -107> 2016-12-22 17:31:45.487722 9506ac0 10 -- 172.21.15.14:6804/20738 reaper reaping pipe 0x3e2a5c20 172.21.15.35:0/146098963 -106> 2016-12-22 17:31:45.487816 9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).discard_queue -105> 2016-12-22 17:31:45.494742 37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).reader done -104> 2016-12-22 17:31:45.500342 3caec700 20 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6809/23500 pipe(0x3bef5590 sd=66 :45406 s=4 pgs=7 cs=1 l=0 c=0x3bef68b0).writer finishing -103> 2016-12-22 17:31:45.500518 3caec700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6809/23500 pipe(0x3bef5590 sd=66 :45406 s=4 pgs=7 cs=1 l=0 c=0x3bef68b0).writer done -102> 2016-12-22 17:31:45.500770 37f54700 5 -- 172.21.15.14:6805/20738 mark_down_all 172.21.15.14:6809/20737 0x3d504e90 -101> 2016-12-22 17:31:45.500962 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6809/20737 pipe(0x3d504e90 sd=71 :6805 s=3 pgs=3 cs=1 l=0 c=0x3bd1c4f0).unregister_pipe - not registered -100> 2016-12-22 17:31:45.501083 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6809/20737 pipe(0x3d504e90 sd=71 :6805 s=3 pgs=3 cs=1 l=0 c=0x3bd1c4f0).stop -99> 2016-12-22 17:31:45.501474 3dcfa700 10 -- 172.21.15.14:6805/20738 queue_reap 0x3bef5590 -98> 2016-12-22 17:31:45.501601 3c3e5700 20 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6809/20737 pipe(0x3d504e90 sd=71 :6805 s=4 pgs=3 cs=1 l=0 c=0x3bd1c4f0).writer finishing -97> 2016-12-22 17:31:45.501760 3c3e5700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.14:6809/20737 pipe(0x3d504e90 sd=71 :6805 s=4 pgs=3 cs=1 l=0 c=0x3bd1c4f0).writer done -96> 2016-12-22 17:31:45.501914 239a8700 1 osd.4 0 ms_handle_reset con 0x3bd1c4f0 session 0x3d534410 -95> 2016-12-22 17:31:45.506337 37f54700 5 -- 172.21.15.14:6805/20738 mark_down_all 172.21.15.35:6801/23511 0x3befb630 -94> 2016-12-22 17:31:45.506436 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6801/23511 pipe(0x3befb630 sd=67 :47804 s=3 pgs=3 cs=1 l=0 c=0x3befc950).unregister_pipe - not registered -93> 2016-12-22 17:31:45.506551 37f54700 10 -- 172.21.15.14:6805/20738 >> 172.21.15.35:6801/23511 pipe(0x3befb630 sd=67 :47804 s=3 pgs=3 cs=1 l=0 c=0x3befc950).stop -92> 2016-12-22 17:31:45.527589 9506ac0 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6151-ge1781dd/rpm/el7/BUILD/ceph-11.1.0-6151-ge1781dd/src/msg/simple/SimpleMessenger.cc: In function 'void SimpleMessenger::reaper()' thread 9506ac0 time 2016-12-22 17:31:45.488264
relevant bit of log. the wait shutdown sequence failed to clear_pipe() the way that mark_down, mark_down_all, and fault() normally do.
Updated by Sage Weil over 7 years ago
- Status changed from 7 to Pending Backport
- Backport set to kraken,jewel
Updated by Nathan Cutler over 7 years ago
Updated by Nathan Cutler over 7 years ago
- Copied to Backport #18378: kraken: msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) added
Updated by Nathan Cutler over 7 years ago
- Copied to Backport #18379: jewel: msg/simple/SimpleMessenger.cc: 239: FAILED assert(!cleared) added
Updated by Nathan Cutler over 6 years ago
- Status changed from Pending Backport to Resolved