Bug #23614
local_reserver double-reservation of backfilled pg
0%
Description
- pg gets reservations (incl local_reserver)
- pg backfills, finishes
- ...apparently enver releases the reservation...
- eio injection, goes back into recovery
- tries to reserve again, asserts out with
0> 2018-04-09 16:34:28.458 7efbea99a700 -1 /build/ceph-13.0.2-821-g69354a7/src/common/AsyncReserver.h: In function 'void AsyncReserver<T>::request_reservation(T, Context*, unsigned int, Context*) [with T = spg_t]' thread 7efbea99a700 time 2018-04-09 16:34:28.463507 /build/ceph-13.0.2-821-g69354a7/src/common/AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !in_progress.count(item)) ceph version 13.0.2-821-g69354a7 (69354a7c01f17f2d50b965ccef0b24337323fba5) mimic (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7efc0d31dea2] 2: (()+0x2de077) [0x7efc0d31e077] 3: (AsyncReserver<spg_t>::request_reservation(spg_t, Context*, unsigned int, Context*)+0x193) [0x5580aceee9e3] 4: (PG::RecoveryState::WaitLocalRecoveryReserved::WaitLocalRecoveryReserved(boost::statechart::state<PG::RecoveryState::WaitLocalRecoveryReserved, PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_ ::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0x285) [0x5580acebd155] 5: (boost::statechart::simple_state<PG::RecoveryState::Clean, PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost:: statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x156) [0x5580acef3fb6] 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x5580aced179b] 7: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x25e) [0x5580aceb823e] 8: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x19c) [0x5580acdfb80c] 9: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x5580ad05aa40] 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x964) [0x5580ace06cb4] 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x452) [0x7efc0d322df2] 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7efc0d324df0] 13: (()+0x76ba) [0x7efc0bdcd6ba] 14: (clone()+0x6d) [0x7efc0b5f641d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
/a/sage-2018-04-09_15:35:18-rados-wip-sage-testing-2018-04-09-0826-distro-basic-smithi/2375009
Related issues
History
#1 Updated by Sage Weil almost 6 years ago
Looking through the code I don't see where the reservation is supposed to be released. I see releases for
- the peering > clean path log recovery > clean (no backfill) backfill revert, full, unfound, or other cancellation
...but not for the normal, successful backfill completion. I must be missing something?
#2 Updated by Josh Durgin almost 6 years ago
- Related to Bug #23490: luminous: osd: double recovery reservation for PG when EIO injected (while already recovering other objects?) added
#3 Updated by Josh Durgin almost 6 years ago
This may be the same root cause as http://tracker.ceph.com/issues/23490
#4 Updated by Neha Ojha almost 6 years ago
- Assignee set to Neha Ojha
#5 Updated by Neha Ojha almost 6 years ago
- Status changed from 12 to Fix Under Review
Explanation of the problem and resolution included in the pull request.
#6 Updated by Josh Durgin almost 6 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to mimic, luminous
#7 Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #24332: mimic: local_reserver double-reservation of backfilled pg added
#8 Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #24333: luminous: local_reserver double-reservation of backfilled pg added
#9 Updated by Nathan Cutler over 5 years ago
- Status changed from Pending Backport to Resolved