Project

General

Profile

Bug #15151

osd: failed assert(object_contexts.empty())

Added by Josh Durgin over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
Start date:
03/15/2016
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

From http://teuthology.ovh.sepia.ceph.com/teuthology/joshd-2016-03-15_20:25:10-rbd-master---basic-openstack/16319/teuthology.log

2016-03-15T21:28:28.669 INFO:tasks.ceph.osd.0.target072060.stderr:2016-03-15 21:28:28.632540 7fc034442700 -1 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 peering] on_flushed: object 1:5cf97bca:::journal.100854265328:head obc still alive
2016-03-15T21:28:28.675 INFO:tasks.ceph.osd.0.target072060.stderr:osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::on_flushed()' thread 7fc034442700 time 2016-03-15 21:28:28.632566
2016-03-15T21:28:28.675 INFO:tasks.ceph.osd.0.target072060.stderr:osd/ReplicatedPG.cc: 9899: FAILED assert(object_contexts.empty())
2016-03-15T21:28:28.676 INFO:tasks.ceph.osd.0.target072060.stderr: ceph version 10.0.5-2446-g7fb0fe5 (7fb0fe568d0c6a17c9a8ec54f9ba2fcea6dce57d)
2016-03-15T21:28:28.676 INFO:tasks.ceph.osd.0.target072060.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fc05266d1bb]
2016-03-15T21:28:28.677 INFO:tasks.ceph.osd.0.target072060.stderr: 2: (ReplicatedPG::get_watchers(std::list<obj_watch_item_t, std::allocator<obj_watch_item_t> >&)+0) [0x7fc052133a30]
2016-03-15T21:28:28.677 INFO:tasks.ceph.osd.0.target072060.stderr: 3: (PG::RecoveryState::Started::react(PG::FlushedEvt const&)+0x3f) [0x7fc0520ac2cf]
2016-03-15T21:28:28.677 INFO:tasks.ceph.osd.0.target072060.stderr: 4: (boost::statechart::simple_state<PG::RecoveryState::Started, PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x1f0) [0x7fc0521282a0]
2016-03-15T21:28:28.677 INFO:tasks.ceph.osd.0.target072060.stderr: 5: (boost::statechart::simple_state<PG::RecoveryState::Primary, PG::RecoveryState::Started, PG::RecoveryState::Peering, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xc0) [0x7fc052129c50]
2016-03-15T21:28:28.677 INFO:tasks.ceph.osd.0.target072060.stderr: 6: (boost::statechart::simple_state<PG::RecoveryState::Peering, PG::RecoveryState::Primary, PG::RecoveryState::GetInfo, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xc0) [0x7fc052127c10]
2016-03-15T21:28:28.678 INFO:tasks.ceph.osd.0.target072060.stderr: 7: (boost::statechart::simple_state<PG::RecoveryState::WaitUpThru, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xbc) [0x7fc05212545c]
2016-03-15T21:28:28.678 INFO:tasks.ceph.osd.0.target072060.stderr: 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x7fc05210f78b]
2016-03-15T21:28:28.678 INFO:tasks.ceph.osd.0.target072060.stderr: 9: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1d5) [0x7fc0520d7db5]
2016-03-15T21:28:28.678 INFO:tasks.ceph.osd.0.target072060.stderr: 10: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x249) [0x7fc052037669]
2016-03-15T21:28:28.678 INFO:tasks.ceph.osd.0.target072060.stderr: 11: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x7fc05207fea2]
2016-03-15T21:28:28.678 INFO:tasks.ceph.osd.0.target072060.stderr: 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x7fc05265e71e]
2016-03-15T21:28:28.679 INFO:tasks.ceph.osd.0.target072060.stderr: 13: (ThreadPool::WorkThread::entry()+0x10) [0x7fc05265f600]
2016-03-15T21:28:28.679 INFO:tasks.ceph.osd.0.target072060.stderr: 14: (()+0x8182) [0x7fc050934182]
2016-03-15T21:28:28.679 INFO:tasks.ceph.osd.0.target072060.stderr: 15: (clone()+0x6d) [0x7fc04ea6247d]

Related issues

Related to Ceph - Bug #13835: on_flushed: object 1/f9b8d3f7/burnupi0951139-337/head obc still alive Can't reproduce 11/19/2015

Associated revisions

Revision 0abff12a (diff)
Added by Sage Weil over 3 years ago

osd/ReplicatedPG: clear watches on change after applying repops

apply_and_flush_repops does any in-flight request side-effects,
including setting up new watches. Clear out watches after that
so we don't leave any behind.

Fixes: #15151
Signed-off-by: Sage Weil <>

History

#1 Updated by Josh Durgin over 3 years ago

  • Related to Bug #13835: on_flushed: object 1/f9b8d3f7/burnupi0951139-337/head obc still alive added

#3 Updated by Sage Weil over 3 years ago

the new on applied/commit stuff reordered the op side-effect and hte reset:

  -323> 2016-03-15 21:28:28.616777 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 luod=0'0 crt=449'5732 lcod 449'5732 mlcod 0'0 active]  up [0,4] -> [0,4], acting [0,1] -> [0,4], acting_primary 0 -> 0, up_primary 0 -> 0, role 0 -> 0, features acting 576460752303423487 upacting 576460752303423487
  -322> 2016-03-15 21:28:28.616788 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] clear_primary_state
  -321> 2016-03-15 21:28:28.616796 7fc034442700 20 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] agent_stop
  -320> 2016-03-15 21:28:28.616804 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] on_change
  -319> 2016-03-15 21:28:28.616812 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  requeue_ops 
  -318> 2016-03-15 21:28:28.616821 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] publish_stats_to_osd 450:23955
  -317> 2016-03-15 21:28:28.616830 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  requeue_ops 
  -316> 2016-03-15 21:28:28.616856 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  -- Watch(139946811755120,client.4104) discard
  -315> 2016-03-15 21:28:28.616866 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  -- Watch(139946811755120,client.4104) unregister_cb
  -314> 2016-03-15 21:28:28.616875 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  -- Watch(139946811755120,client.4104) actually registered, cancelling
  -313> 2016-03-15 21:28:28.616886 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  -- Watch(139946811755120,client.4104) ~Watch
  -312> 2016-03-15 21:28:28.616944 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] cancel_copy_ops
  -311> 2016-03-15 21:28:28.616951 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] cancel_flush_ops
  -310> 2016-03-15 21:28:28.616960 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] cancel_proxy_ops
  -309> 2016-03-15 21:28:28.616970 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  requeue_ops 
  -308> 2016-03-15 21:28:28.616977 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  requeue_ops 
  -307> 2016-03-15 21:28:28.616985 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  canceling repop tid 631
  -306> 2016-03-15 21:28:28.616993 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  requeuing osd_op(client.4104.0:273888 1.53de9f3a journal.100854265328 [watch watch cookie 139946811733072] snapc 0=[] ondisk+write+known_if_redirected e449) v7
  -305> 2016-03-15 21:28:28.617005 7fc034442700 20 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] remove_repop repgather(0x7fc060e0a520 449'5734 rep_tid=631 committed?=0 applied?=0)
  -304> 2016-03-15 21:28:28.617014 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] do_osd_op_effects client.4104 con 0x7fc061e57780
  -303> 2016-03-15 21:28:28.617023 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] do_osd_op_effects applying watch connect on session 0x7fc062eee000 watcher 139946811733072,client.4104
  -302> 2016-03-15 21:28:28.617031 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] do_osd_op_effects new watcher 139946811733072,client.4104
  -301> 2016-03-15 21:28:28.617039 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  -- Watch(139946811733072,client.4104) Watch()
  -300> 2016-03-15 21:28:28.617049 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  -- Watch(139946811733072,client.4104) connecting
  -299> 2016-03-15 21:28:28.617059 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  -- Watch(139946811733072,client.4104) registering callback, timeout: 30
  -298> 2016-03-15 21:28:28.617084 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  requeue_ops 
  -297> 2016-03-15 21:28:28.617108 7fc034442700 15 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive]  requeue_ops 0x7fc06199a800
  -296> 2016-03-15 21:28:28.617124 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] on_change_cleanup
  -295> 2016-03-15 21:28:28.617133 7fc034442700 10 osd.0 pg_epoch: 450 pg[1.2( v 449'5734 (205'2712,449'5734] local-les=428 n=186 ec=5 les/c/f 428/449/0 427/450/375) [0,4] r=0 lpr=450 pi=427-449/1 crt=449'5732 lcod 449'5732 mlcod 0'0 inactive] on_change

#4 Updated by Sage Weil over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
  • Source changed from other to Q/A

#5 Updated by Sage Weil over 3 years ago

I was able to reproduce this with

MON=1 OSD=4 MDS=0 ./vstart.sh -d -n -x -l -o 'ms inject delay probability = .1' -o 'ms inject delay type = osd'
./rados -p rbd put foo /dev/null ; ./rados -p rbd put bar /dev/null
while true ; do echo | ./rados -p rbd watch foo ; done &
while true ; do echo | ./rados -p rbd watch bar ; done &
while true ; do ./ceph osd in 0 ; sleep 5 ; ./ceph osd out 0 ; sleep 5 ; done

#6 Updated by Sage Weil over 3 years ago

  • Status changed from In Progress to Need Review

#7 Updated by Sage Weil over 3 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF