Project

General

Profile

Actions

Bug #4162

closed

mon: Single-Paxos: on sync, corrupted paxos store

Added by Joao Eduardo Luis about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We've been thrashing the monitors pretty hard, and in this case the assert was triggered as follows:

- mon.3 sent a 'sync_start' to mon.17
- mon.17 forwarded 'sync_start' to mon.1 (leader)
- mon.1 replied to mon.3 with 'sync_start_reply'
- mon.3 sent a 'sync_start_chunks' to mon.17
- mon.17 sent chunks to mon.3

The problem here is that mon.17 was also synchronizing, thus didn't have a valid store state.

The solution can be one of two:
  • the leader specified to whom the requester should connect in order to sync
    - Upside: the leader can specify quorum members from which the monitors can sync from, and may even try to balance the load across the quorum
    - Downside: the leader might get overloaded if everybody picks him
  • the selected sync provider, if he himself is also mid-sync, forwards the request to his sync provider.
    - Upside: Likelier balance of workload, distributed across the various sync providers
    - Downside: some monitors may get overloaded, while others don't
    - Downside: seems like a crude approach (the first approach looks better, so we're going with it)
2013-02-15 15:29:57.167126 7ffcba6dc700 10 mon.f@3(synchronizing sync( requester state stop )) e1 handle_sync mon_sync( finish_reply ) v1
2013-02-15 15:29:57.167136 7ffcba6dc700 10 mon.f@3(synchronizing sync( requester state stop )) e1 handle_sync_finish_reply mon_sync( finish_reply ) v1
2013-02-15 15:29:57.167206 7ffcba6dc700 10 mon.f@3(synchronizing).paxos(paxos recovering c 0..0) reapply_all_versions first 0 last 1724
2013-02-15 15:29:57.173908 7ffcba6dc700 -1 mon/Paxos.cc: In function 'void Paxos::apply_version(MonitorDBStore::Transaction&, version_t)' thread 7ffcba6dc700 time 2013-02-15 15:29:57.167260
mon/Paxos.cc: 58: FAILED assert(bl.length())

 ceph version 0.56-786-gbf8d1ed (bf8d1ed419738a9519ee413a6a81e9ca8f99da46)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x915879]
 2: (Paxos::apply_version(MonitorDBStore::Transaction&, unsigned long)+0xb4) [0x75c3d2]
 3: (Paxos::reapply_all_versions()+0x432) [0x75c85c]
 4: (Monitor::handle_sync_finish_reply(MMonSync*)+0x401) [0x6ff2b1]
 5: (Monitor::handle_sync(MMonSync*)+0x236) [0x6ffa96]
 6: (Monitor::_ms_dispatch(Message*)+0xf6d) [0x70c663]
 7: (Monitor::ms_dispatch(Message*)+0x38) [0x72433a]
 8: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x97ae6d]
 9: (DispatchQueue::entry()+0x549) [0x97a619]
 10: (DispatchQueue::DispatchThread::entry()+0x1c) [0x900fee]
 11: (Thread::_entry_func(void*)+0x23) [0x908f2d]
 12: (()+0x7e9a) [0x7ffcbfa36e9a]
 13: (clone()+0x6d) [0x7ffcbe1ef4bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 2 (0 open2 closed)

Related to Ceph - Feature #2611: mon: Single-PaxosResolvedJoao Eduardo Luis06/20/201207/09/2012

Actions
Has duplicate Ceph - Bug #4103: mon: Single-Paxos: on MonitorDBStore, segfault during syncDuplicateJoao Eduardo Luis02/12/2013

Actions
Actions #1

Updated by Joao Eduardo Luis about 11 years ago

  • Description updated (diff)
Actions #2

Updated by Joao Eduardo Luis about 11 years ago

  • Description updated (diff)
Actions #3

Updated by Sage Weil about 11 years ago

I haven't been following this code at all, but: having that sort of circular message path tends to introduce all kinds of difficult corner cases and potential bugs. If mon.17 doesn't have the info we need, it should tell mon.3 to talk to mon.1 instead, and then mon.3 can contact mon.1 directly...?

Actions #4

Updated by Joao Eduardo Luis about 11 years ago

The current approach is:

  • we (say, mon.3) contact the first monitor to reply to a probe with a higher paxos version than the one we have (considering the paxos drift, ofc), say mon.5
  • if mon.5 is in the quorum:
    - if mon.5 is the leader, it will handle the request and point out with whom mon.3 should sync with
    - if mon.5 is not the leader, it will forward the request to the leader
  • if mon.5 is not in the quorum:
    - if mon.5 is synchronizing, it will forward the request to the one he's synchronizing with
    - else, it will allow mon.3 to synchronize from him (this handles the case in which we have no quorum yet because a majority of monitors have fallen behind on paxos; e.g, when adding a 2nd monitor to a 1 mon cluster)

Does this make sense to you?

Actions #5

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from In Progress to 4

This hasn't been triggered since the patch.

Any objections on the approach?

Actions #6

Updated by Sage Weil about 11 years ago

Sounds ok!

Actions #7

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from 4 to Resolved

Forgot to kill teuthology, and the same job has been hammering the branch for two days now, and everything is still good.

Marking as resolved.

Actions

Also available in: Atom PDF