Project

General

Profile

Bug #9295

osd/OSD.cc: 5501: FAILED assert(session) in ms_fast_dispatch

Added by Sage Weil about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
Start date:
08/29/2014
Due date:
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

  -118> 2014-08-28 23:40:51.230851 7f9a5fe99700 -1 osd/OSD.cc: In function 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f9a5fe99700 time 2014-08-28 23:40:51.229339
osd/OSD.cc: 5501: FAILED assert(session)

 ceph version 0.84-823-gf807a7e (f807a7ee2cab3d3eb0877e4846580fdbf3a6b3d0)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb7a60b]
 2: (OSD::ms_fast_dispatch(Message*)+0x3db) [0x6988ab]
 3: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xc191e6]
 4: (Pipe::DelayedDelivery::entry()+0x180) [0xc32640]
 5: (()+0x8182) [0x7f9a851c4182]
 6: (clone()+0x6d) [0x7f9a8373038d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/sage-2014-08-28_16:08:59-rados-master-testing-basic-multi/458959

Related issues

Related to Ceph - Bug #9462: msgr deadlock: osd reply vs mark_down vs fault Resolved 09/13/2014

Associated revisions

Revision 681a7548 (diff)
Added by Greg Farnum about 5 years ago

Pipe: stop delayed delivery fast_dispatch in stop_and_wait()

If we don't, we can keep fast_dispatching messages after a Pipe has been
mark_down()ed. That breaks things right now.

Fixes: #9295

Signed-off-by: Greg Farnum <>

History

#1 Updated by Sage Weil about 5 years ago

we receive a message...

  -132> 2014-08-28 23:40:51.229117 7f9a5fe99700  1 -- 10.214.133.120:6801/5754 <== osd.3 10.214.132.120:6801/7957 3048 ==== MOSDPGPush(1.16 70 [PushOp(495d5256/benchmark_data_mira046_31610_object1472/head//1, version: 32'71, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(495d5256/benchmark_data_mira046_31610_object1472/head//1@32'71, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(63f27b16/benchmark_data_mira046_31610_object1559/head//1, version: 32'81, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(63f27b16/benchmark_data_mira046_31610_object1559/head//1@32'81, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1, version: 32'84, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1@32'84, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 ==== 3148347+0+0 (3820707418 0 0) 0x45eee00 con 0x3cd5180
  -131> 2014-08-28 23:40:51.229179 7f9a5fe99700  5 -- op tracker -- , seq: 9458, time: 2014-08-28 23:40:51.200164, event: header_read, op: MOSDPGPush(1.16 70 [PushOp(495d5256/benchmark_data_mira046_31610_object1472/head//1, version: 32'71, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(495d5256/benchmark_data_mira046_31610_object1472/head//1@32'71, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(63f27b16/benchmark_data_mira046_31610_object1559/head//1, version: 32'81, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(63f27b16/benchmark_data_mira046_31610_object1559/head//1@32'81, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1, version: 32'84, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1@32'84, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))])
  -130> 2014-08-28 23:40:51.229199 7f9a5fe99700  5 -- op tracker -- , seq: 9458, time: 2014-08-28 23:40:51.200166, event: throttled, op: MOSDPGPush(1.16 70 [PushOp(495d5256/benchmark_data_mira046_31610_object1472/head//1, version: 32'71, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(495d5256/benchmark_data_mira046_31610_object1472/head//1@32'71, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(63f27b16/benchmark_data_mira046_31610_object1559/head//1, version: 32'81, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(63f27b16/benchmark_data_mira046_31610_object1559/head//1@32'81, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1, version: 32'84, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1@32'84, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))])
  -129> 2014-08-28 23:40:51.229214 7f9a5fe99700  5 -- op tracker -- , seq: 9458, time: 2014-08-28 23:40:51.229044, event: all_read, op: MOSDPGPush(1.16 70 [PushOp(495d5256/benchmark_data_mira046_31610_object1472/head//1, version: 32'71, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(495d5256/benchmark_data_mira046_31610_object1472/head//1@32'71, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(63f27b16/benchmark_data_mira046_31610_object1559/head//1, version: 32'81, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(63f27b16/benchmark_data_mira046_31610_object1559/head//1@32'81, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1, version: 32'84, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1@32'84, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))])
  -128> 2014-08-28 23:40:51.229229 7f9a5fe99700  5 -- op tracker -- , seq: 9458, time: 0.000000, event: dispatched, op: MOSDPGPush(1.16 70 [PushOp(495d5256/benchmark_data_mira046_31610_object1472/head//1, version: 32'71, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(495d5256/benchmark_data_mira046_31610_object1472/head//1@32'71, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(63f27b16/benchmark_data_mira046_31610_object1559/head//1, version: 32'81, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(63f27b16/benchmark_data_mira046_31610_object1559/head//1@32'81, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1, version: 32'84, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1@32'84, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))])
  -127> 2014-08-28 23:40:51.229235 7f9a5fe99700 10 osd.0 72 handle_replica_op MOSDPGPush(1.16 70 [PushOp(495d5256/benchmark_data_mira046_31610_object1472/head//1, version: 32'71, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(495d5256/benchmark_data_mira046_31610_object1472/head//1@32'71, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(63f27b16/benchmark_data_mira046_31610_object1559/head//1, version: 32'81, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(63f27b16/benchmark_data_mira046_31610_object1559/head//1@32'81, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1, version: 32'84, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1@32'84, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 epoch 70

but it is from an old osd, so we mark it down
  -126> 2014-08-28 23:40:51.229249 7f9a5fe99700  5 osd.0 72 from dead osd.3, marking down,  msg was 10.214.132.120:6801/7957 expected :/0
  -125> 2014-08-28 23:40:51.229254 7f9a5fe99700  1 -- 10.214.133.120:6801/5754 mark_down 0x3cd5180 -- 0x6cb0840

then we get another one. but this one was sitting in the delayed queue, so we deliver it even tho the con has been marked down.
  -124> 2014-08-28 23:40:51.229289 7f9a5fe99700  5 -- op tracker -- , seq: 9458, time: 2014-08-28 23:40:51.229275, event: done, op: MOSDPGPush(1.16 70 [PushOp(495d5256/benchmark_data_mira046_31610_object1472/head//1, version: 32'71, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(495d5256/benchmark_data_mira046_31610_object1472/head//1@32'71, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(63f27b16/benchmark_data_mira046_31610_object1559/head//1, version: 32'81, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(63f27b16/benchmark_data_mira046_31610_object1559/head//1@32'81, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1, version: 32'84, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(cd59cdd6/benchmark_data_mira046_31610_object1586/head//1@32'84, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))])
  -123> 2014-08-28 23:40:51.229306 7f9a5fe99700  1 -- 10.214.133.120:6801/5754 <== osd.3 10.214.132.120:6801/7957 3049 ==== MOSDPGPushReply(1.11 70 [PushReplyOp(b7b81b91/benchmark_data_mira046_31610_object1296/head//1)]) v2 ==== 129+0+0 (1468918910 0 0) 0x3d59e00 con 0x3cd5180
  -122> 2014-08-28 23:40:51.229316 7f9a5fe99700  5 -- op tracker -- , seq: 9459, time: 2014-08-28 23:40:51.229066, event: header_read, op: MOSDPGPushReply(1.11 70 [PushReplyOp(b7b81b91/benchmark_data_mira046_31610_object1296/head//1)])
  -121> 2014-08-28 23:40:51.229321 7f9a5fe99700  5 -- op tracker -- , seq: 9459, time: 2014-08-28 23:40:51.229067, event: throttled, op: MOSDPGPushReply(1.11 70 [PushReplyOp(b7b81b91/benchmark_data_mira046_31610_object1296/head//1)])
  -120> 2014-08-28 23:40:51.229325 7f9a5fe99700  5 -- op tracker -- , seq: 9459, time: 2014-08-28 23:40:51.229095, event: all_read, op: MOSDPGPushReply(1.11 70 [PushReplyOp(b7b81b91/benchmark_data_mira046_31610_object1296/head//1)])
  -119> 2014-08-28 23:40:51.229329 7f9a5fe99700  5 -- op tracker -- , seq: 9459, time: 0.000000, event: dispatched, op: MOSDPGPushReply(1.11 70 [PushReplyOp(b7b81b91/benchmark_data_mira046_31610_object1296/head//1)])
  -118> 2014-08-28 23:40:51.230851 7f9a5fe99700 -1 osd/OSD.cc: In function 'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f9a5fe99700 time 2014-08-28 23:40:51.229339
osd/OSD.cc: 5501: FAILED assert(session)

 ceph version 0.84-823-gf807a7e (f807a7ee2cab3d3eb0877e4846580fdbf3a6b3d0)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb7a60b]
 2: (OSD::ms_fast_dispatch(Message*)+0x3db) [0x6988ab]
 3: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xc191e6]
 4: (Pipe::DelayedDelivery::entry()+0x180) [0xc32640]
 5: (()+0x8182) [0x7f9a851c4182]
 6: (clone()+0x6d) [0x7f9a8373038d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#2 Updated by Sage Weil about 5 years ago

  • Status changed from New to Need Review

#3 Updated by Sage Weil about 5 years ago

teuthology-2014-09-07_02:30:03-rados-next-testing-basic-multi/470990

#4 Updated by Sage Weil almost 5 years ago

dup of #9462

#5 Updated by Sage Weil almost 5 years ago

  • Status changed from Need Review to Duplicate

#6 Updated by Sage Weil almost 5 years ago

  • Status changed from Duplicate to Resolved

Also available in: Atom PDF