Bug #4162: mon: Single-Paxos: on sync, corrupted paxos store - Ceph - Ceph

Actions

Copy link

Bug #4162

closed

mon: Single-Paxos: on sync, corrupted paxos store

Added by Joao Eduardo Luis about 11 years ago. Updated about 11 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Joao Eduardo Luis

Category:

Monitor

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We've been thrashing the monitors pretty hard, and in this case the assert was triggered as follows:

- mon.3 sent a 'sync_start' to mon.17
- mon.17 forwarded 'sync_start' to mon.1 (leader)
- mon.1 replied to mon.3 with 'sync_start_reply'
- mon.3 sent a 'sync_start_chunks' to mon.17
- mon.17 sent chunks to mon.3

The problem here is that mon.17 was also synchronizing, thus didn't have a valid store state.

The solution can be one of two:

the leader specified to whom the requester should connect in order to sync
- Upside: the leader can specify quorum members from which the monitors can sync from, and may even try to balance the load across the quorum
- Downside: the leader might get overloaded if everybody picks him
the selected sync provider, if he himself is also mid-sync, forwards the request to his sync provider.
- Upside: Likelier balance of workload, distributed across the various sync providers
- Downside: some monitors may get overloaded, while others don't
- Downside: seems like a crude approach (the first approach looks better, so we're going with it)

2013-02-15 15:29:57.167126 7ffcba6dc700 10 mon.f@3(synchronizing sync( requester state stop )) e1 handle_sync mon_sync( finish_reply ) v1
2013-02-15 15:29:57.167136 7ffcba6dc700 10 mon.f@3(synchronizing sync( requester state stop )) e1 handle_sync_finish_reply mon_sync( finish_reply ) v1
2013-02-15 15:29:57.167206 7ffcba6dc700 10 mon.f@3(synchronizing).paxos(paxos recovering c 0..0) reapply_all_versions first 0 last 1724
2013-02-15 15:29:57.173908 7ffcba6dc700 -1 mon/Paxos.cc: In function 'void Paxos::apply_version(MonitorDBStore::Transaction&, version_t)' thread 7ffcba6dc700 time 2013-02-15 15:29:57.167260
mon/Paxos.cc: 58: FAILED assert(bl.length())

 ceph version 0.56-786-gbf8d1ed (bf8d1ed419738a9519ee413a6a81e9ca8f99da46)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x915879]
 2: (Paxos::apply_version(MonitorDBStore::Transaction&, unsigned long)+0xb4) [0x75c3d2]
 3: (Paxos::reapply_all_versions()+0x432) [0x75c85c]
 4: (Monitor::handle_sync_finish_reply(MMonSync*)+0x401) [0x6ff2b1]
 5: (Monitor::handle_sync(MMonSync*)+0x236) [0x6ffa96]
 6: (Monitor::_ms_dispatch(Message*)+0xf6d) [0x70c663]
 7: (Monitor::ms_dispatch(Message*)+0x38) [0x72433a]
 8: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x97ae6d]
 9: (DispatchQueue::entry()+0x549) [0x97a619]
 10: (DispatchQueue::DispatchThread::entry()+0x1c) [0x900fee]
 11: (Thread::_entry_func(void*)+0x23) [0x908f2d]
 12: (()+0x7e9a) [0x7ffcbfa36e9a]
 13: (clone()+0x6d) [0x7ffcbe1ef4bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Joao Eduardo Luis about 11 years ago

Description updated (diff)

Actions

Copy link

Updated by Joao Eduardo Luis about 11 years ago

Description updated (diff)

Actions

Copy link

Updated by Sage Weil about 11 years ago

I haven't been following this code at all, but: having that sort of circular message path tends to introduce all kinds of difficult corner cases and potential bugs. If mon.17 doesn't have the info we need, it should tell mon.3 to talk to mon.1 instead, and then mon.3 can contact mon.1 directly...?

Actions

Copy link

Updated by Joao Eduardo Luis about 11 years ago

The current approach is:

we (say, mon.3) contact the first monitor to reply to a probe with a higher paxos version than the one we have (considering the paxos drift, ofc), say mon.5
if mon.5 is in the quorum:
- if mon.5 is the leader, it will handle the request and point out with whom mon.3 should sync with
- if mon.5 is not the leader, it will forward the request to the leader
if mon.5 is not in the quorum:
- if mon.5 is synchronizing, it will forward the request to the one he's synchronizing with
- else, it will allow mon.3 to synchronize from him (this handles the case in which we have no quorum yet because a majority of monitors have fallen behind on paxos; e.g, when adding a 2nd monitor to a 1 mon cluster)

Does this make sense to you?

Actions

Copy link