Project

General

Profile

Bug #4026

mon: Single-Paxos: abort on LogMonitor::update_from_paxos

Added by Joao Eduardo Luis about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While running teuthology with 20+ monitors, the monitor workloadgen with 10 osds, and mon thrasher, we triggered the following behavior on a peon:

2013-02-05 09:53:14.196821 7f9d0e14a700 10 mon.r@15(peon).pg v655 send_pg_creates to 0 pgs
2013-02-05 09:53:14.196831 7f9d0e14a700 10 mon.r@15(peon).pg v655 update_logger
2013-02-05 09:53:14.196902 7f9d0e14a700 10 mon.r@15(peon).pg v655 update_logger
2013-02-05 09:53:14.197039 7f9d0e14a700 10 mon.r@15(peon).mds e25 e25: 1/1/1 up {0=a=up:active}
2013-02-05 09:53:14.197072 7f9d0e14a700 10 mon.r@15(peon).mds e25 update_logger
2013-02-05 09:53:14.197167 7f9d0e14a700 10 mon.r@15(peon).osd e37 update_logger
2013-02-05 09:53:14.197180 7f9d0e14a700 10 mon.r@15(peon).osd e37 kick_all_failures on 0 osds
2013-02-05 09:53:14.209074 7f9d0e14a700 -1 *** Caught signal (Aborted) **
 in thread 7f9d0e14a700

 ceph version 0.56-488-gda7502a (da7502a0f7326183a02bc45f1f36c9d6b19a6450)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x84075f]
 2: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x83fec6]
 3: (()+0xfcb0) [0x7f9d12cabcb0]
 4: (gsignal()+0x35) [0x7f9d113a0445]
 5: (abort()+0x17b) [0x7f9d113a3bab]
 6: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f9d11cee69d]
 7: (()+0xb5846) [0x7f9d11cec846]
 8: (()+0xb5873) [0x7f9d11cec873]
 9: (()+0xb596e) [0x7f9d11cec96e]
 10: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0xc8) [0x90e6c6]
 11: (void decode_raw<unsigned char>(unsigned char&, ceph::buffer::list::iterator&)+0x25) [0x7253e1]
 12: (decode(unsigned char&, ceph::buffer::list::iterator&)+0x23) [0x714f58]
 13: (LogMonitor::update_from_paxos()+0x44c) [0x7f23e6]
 14: (PaxosService::_active()+0x2b1) [0x768dbf]
 15: (PaxosService::C_Active::finish(int)+0x25) [0x76a4b9]
 16: (Context::complete(int)+0x2b) [0x7166fb]
 17: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x259) [0x71698f]
 18: (Paxos::handle_lease(MMonPaxos*)+0x7d3) [0x7604d9]
 19: (Paxos::dispatch(PaxosServiceMessage*)+0x337) [0x762f4b]
 20: (Monitor::_ms_dispatch(Message*)+0x138e) [0x708fec]
 21: (Monitor::ms_dispatch(Message*)+0x38) [0x71f598]
 22: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x971a05]
 23: (DispatchQueue::entry()+0x549) [0x9711b1]
 24: (DispatchQueue::DispatchThread::entry()+0x1c) [0x8f86a4]
 25: (Thread::_entry_func(void*)+0x23) [0x9005a1]
 26: (()+0x7e9a) [0x7f9d12ca3e9a]
 27: (clone()+0x6d) [0x7f9d1145c4bd]

4026.tar.bz2 - leader's and affected peon's logs (~80MB decompressed) (4.64 MB) Joao Eduardo Luis, 02/05/2013 05:24 PM


Related issues

Related to Ceph - Feature #2611: mon: Single-Paxos Resolved 06/20/2012 07/09/2012
Related to Ceph - Bug #4040: mon: Single-Paxos: on PGMonitor, FAILED assert(0 == "update_from_paxos: error parsing incremental update") Resolved 02/07/2013
Related to Ceph - Bug #4037: mon: Single-Paxos: on Paxos, FAILED assert(begin->last_committed == last_committed) Resolved 02/06/2013

Associated revisions

Revision cab3411b (diff)
Added by Joao Eduardo Luis about 11 years ago

mon: Monitor: Add monitor store synchronization support

Synchronize two monitor stores when one of the monitors has diverged
significantly from the remaining monitor cluster.

This process roughly consists of the following steps:

0. mon.X tries to join the cluster;
1. mon.X verifies that it has diverged from the remaining cluster;
2. mon.X asks the leader to sync;
3. the leader allows mon.X to sync, pointing out a mon.Y from
which mon.X should sync;
4. mon.X asks mon.Y to sync;
5. mon.Y sends its own store in one or more chunks;
6. mon.X acks each received chunk; go to 5;
7. mon.X receives the last chunk from mon.Y;
8. mon.X informs the leader that it has finished synchronizing;
9. the leader acks mon.X's finished sync;
10. mon.X bootstraps and retries joining the cluster (goto 0.)

This is the most simple and straightforward process that can be hoped
for. However, things may go sideways at any time (monitors failing, for
instance), which could potentially lead to a corrupted monitor store.
There are however mechanisms at work to avoid such scenario at any step
of the process.

Some of these mechanisms include:

- aborting the sync if the leader fails or leadership changes;
- state barriers on synchronization functions to avoid stray/outdated
messages from interfering on the normal monitor behavior or on-going
synchronization;
- store clean-up before any synchronization process starts;
- store clean-up if a sync process fails;
- resuming sync from a different monitor mon.Z if mon.Y fails mid-sync;
- several timeouts to guarantee that all the involved parties are still
alive and participating in the sync effort.
- request forwarding when mon.X contacts a monitor outside the quorum
that might know who the leader is (or might know someone who does)
[4].

Changes:
- Adapt the MMonProbe message for the single-paxos approach, dropping
the version map and using a lower and upper bound version instead.
- Remove old slurp code.
- Add 'sync force' command; 'sync_force' through the admin socket.

Notes:

[1] It's important to keep track of the paxos version at the time at
which a store sync starts. Given that after the sync we end up with
the same state as the monitor we are synchronizing from, there is a
chance that we might end up with an uncommitted paxos version if we
are synchronizing with the leader (there's some paxos stashing done
prior to commit on the leader). By keeping track at which version
the sync started, we can then let the requester to which version he
should cap its paxos store.

[2] Furthermore, the enforced paxos cap, described on [1], is even more
important if we consider the need to reapply the paxos versions that
were received during the sync, to make sure the paxos store is
consistent. If we happened to have some yet-uncommitted version in
the store, we could end up applying it.

[3] What is described in [1] and [2]:

Fixes: #4026
Fixes: #4037
Fixes: #4040

[4] Whenever a given monitor mon.X is on the probing phase and notices
that there is a mon.Y with a paxos version considerably higher than
the one mon.X has, then mon.X will attempt to synchronize from
mon.Y. This is the basis for the store sync. However this might
hold true, the fact is that there might be a chance that, by the
time mon.Y handles the sync request from mon.X, mon.Y might already
be attempting a sync himself with some other mon.Z. In this case,
the appropriate thing for mon.Y to do is to forward mon.X's request
to mon.Z, as mon.Z should be part of the quorum, know who the leader
is or be the leader himself -- if not, at least it is guaranteed
that mon.Z has a higher version than both mon.X and mon.Y, so it
should be okay to sync from him.

Fixes: #4162

Signed-off-by: Joao Eduardo Luis <>

History

#1 Updated by Joao Eduardo Luis about 11 years ago

#2 Updated by Joao Eduardo Luis about 11 years ago

Haven't been able to reproduce this nor to find an obvious cause for this to have happened.

After inspecting the store and comparing the versions within with those of the leader's store, nothing appeared to be wrong.

This appears to have happened when decoding the first byte on the bufferlist (corresponding to the log version? not sure), but we didn't have much debug infos on this function to pinpoint exactly where it happened; gdb wasn't much help either as it was complaining about being unable to resolve the overloaded instance (maybe lack of debug symbols on the gitbuilder build?).

Anyway, I've pushed a patch to make this function a bit more verbose and will be re-running this test in the off-chance of reproducing this bug.

#3 Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from New to In Progress

#4 Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF