Bug #4162
closedmon: Single-Paxos: on sync, corrupted paxos store
0%
Description
We've been thrashing the monitors pretty hard, and in this case the assert was triggered as follows:
- mon.3 sent a 'sync_start' to mon.17
- mon.17 forwarded 'sync_start' to mon.1 (leader)
- mon.1 replied to mon.3 with 'sync_start_reply'
- mon.3 sent a 'sync_start_chunks' to mon.17
- mon.17 sent chunks to mon.3
The problem here is that mon.17 was also synchronizing, thus didn't have a valid store state.
The solution can be one of two:- the leader specified to whom the requester should connect in order to sync
- Upside: the leader can specify quorum members from which the monitors can sync from, and may even try to balance the load across the quorum
- Downside: the leader might get overloaded if everybody picks him - the selected sync provider, if he himself is also mid-sync, forwards the request to his sync provider.
- Upside: Likelier balance of workload, distributed across the various sync providers
- Downside: some monitors may get overloaded, while others don't
- Downside: seems like a crude approach (the first approach looks better, so we're going with it)
2013-02-15 15:29:57.167126 7ffcba6dc700 10 mon.f@3(synchronizing sync( requester state stop )) e1 handle_sync mon_sync( finish_reply ) v1 2013-02-15 15:29:57.167136 7ffcba6dc700 10 mon.f@3(synchronizing sync( requester state stop )) e1 handle_sync_finish_reply mon_sync( finish_reply ) v1 2013-02-15 15:29:57.167206 7ffcba6dc700 10 mon.f@3(synchronizing).paxos(paxos recovering c 0..0) reapply_all_versions first 0 last 1724 2013-02-15 15:29:57.173908 7ffcba6dc700 -1 mon/Paxos.cc: In function 'void Paxos::apply_version(MonitorDBStore::Transaction&, version_t)' thread 7ffcba6dc700 time 2013-02-15 15:29:57.167260 mon/Paxos.cc: 58: FAILED assert(bl.length()) ceph version 0.56-786-gbf8d1ed (bf8d1ed419738a9519ee413a6a81e9ca8f99da46) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x915879] 2: (Paxos::apply_version(MonitorDBStore::Transaction&, unsigned long)+0xb4) [0x75c3d2] 3: (Paxos::reapply_all_versions()+0x432) [0x75c85c] 4: (Monitor::handle_sync_finish_reply(MMonSync*)+0x401) [0x6ff2b1] 5: (Monitor::handle_sync(MMonSync*)+0x236) [0x6ffa96] 6: (Monitor::_ms_dispatch(Message*)+0xf6d) [0x70c663] 7: (Monitor::ms_dispatch(Message*)+0x38) [0x72433a] 8: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x97ae6d] 9: (DispatchQueue::entry()+0x549) [0x97a619] 10: (DispatchQueue::DispatchThread::entry()+0x1c) [0x900fee] 11: (Thread::_entry_func(void*)+0x23) [0x908f2d] 12: (()+0x7e9a) [0x7ffcbfa36e9a] 13: (clone()+0x6d) [0x7ffcbe1ef4bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Sage Weil about 11 years ago
I haven't been following this code at all, but: having that sort of circular message path tends to introduce all kinds of difficult corner cases and potential bugs. If mon.17 doesn't have the info we need, it should tell mon.3 to talk to mon.1 instead, and then mon.3 can contact mon.1 directly...?
Updated by Joao Eduardo Luis about 11 years ago
The current approach is:
- we (say, mon.3) contact the first monitor to reply to a probe with a higher paxos version than the one we have (considering the paxos drift, ofc), say mon.5
- if mon.5 is in the quorum:
- if mon.5 is the leader, it will handle the request and point out with whom mon.3 should sync with
- if mon.5 is not the leader, it will forward the request to the leader - if mon.5 is not in the quorum:
- if mon.5 is synchronizing, it will forward the request to the one he's synchronizing with
- else, it will allow mon.3 to synchronize from him (this handles the case in which we have no quorum yet because a majority of monitors have fallen behind on paxos; e.g, when adding a 2nd monitor to a 1 mon cluster)
Does this make sense to you?
Updated by Joao Eduardo Luis about 11 years ago
- Status changed from In Progress to 4
This hasn't been triggered since the patch.
Any objections on the approach?
Updated by Joao Eduardo Luis about 11 years ago
- Status changed from 4 to Resolved
Forgot to kill teuthology, and the same job has been hammering the branch for two days now, and everything is still good.
Marking as resolved.