Project

General

Profile

Actions

Bug #4026

closed

mon: Single-Paxos: abort on LogMonitor::update_from_paxos

Added by Joao Eduardo Luis about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While running teuthology with 20+ monitors, the monitor workloadgen with 10 osds, and mon thrasher, we triggered the following behavior on a peon:

2013-02-05 09:53:14.196821 7f9d0e14a700 10 mon.r@15(peon).pg v655 send_pg_creates to 0 pgs
2013-02-05 09:53:14.196831 7f9d0e14a700 10 mon.r@15(peon).pg v655 update_logger
2013-02-05 09:53:14.196902 7f9d0e14a700 10 mon.r@15(peon).pg v655 update_logger
2013-02-05 09:53:14.197039 7f9d0e14a700 10 mon.r@15(peon).mds e25 e25: 1/1/1 up {0=a=up:active}
2013-02-05 09:53:14.197072 7f9d0e14a700 10 mon.r@15(peon).mds e25 update_logger
2013-02-05 09:53:14.197167 7f9d0e14a700 10 mon.r@15(peon).osd e37 update_logger
2013-02-05 09:53:14.197180 7f9d0e14a700 10 mon.r@15(peon).osd e37 kick_all_failures on 0 osds
2013-02-05 09:53:14.209074 7f9d0e14a700 -1 *** Caught signal (Aborted) **
 in thread 7f9d0e14a700

 ceph version 0.56-488-gda7502a (da7502a0f7326183a02bc45f1f36c9d6b19a6450)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x84075f]
 2: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x83fec6]
 3: (()+0xfcb0) [0x7f9d12cabcb0]
 4: (gsignal()+0x35) [0x7f9d113a0445]
 5: (abort()+0x17b) [0x7f9d113a3bab]
 6: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f9d11cee69d]
 7: (()+0xb5846) [0x7f9d11cec846]
 8: (()+0xb5873) [0x7f9d11cec873]
 9: (()+0xb596e) [0x7f9d11cec96e]
 10: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0xc8) [0x90e6c6]
 11: (void decode_raw<unsigned char>(unsigned char&, ceph::buffer::list::iterator&)+0x25) [0x7253e1]
 12: (decode(unsigned char&, ceph::buffer::list::iterator&)+0x23) [0x714f58]
 13: (LogMonitor::update_from_paxos()+0x44c) [0x7f23e6]
 14: (PaxosService::_active()+0x2b1) [0x768dbf]
 15: (PaxosService::C_Active::finish(int)+0x25) [0x76a4b9]
 16: (Context::complete(int)+0x2b) [0x7166fb]
 17: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x259) [0x71698f]
 18: (Paxos::handle_lease(MMonPaxos*)+0x7d3) [0x7604d9]
 19: (Paxos::dispatch(PaxosServiceMessage*)+0x337) [0x762f4b]
 20: (Monitor::_ms_dispatch(Message*)+0x138e) [0x708fec]
 21: (Monitor::ms_dispatch(Message*)+0x38) [0x71f598]
 22: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x971a05]
 23: (DispatchQueue::entry()+0x549) [0x9711b1]
 24: (DispatchQueue::DispatchThread::entry()+0x1c) [0x8f86a4]
 25: (Thread::_entry_func(void*)+0x23) [0x9005a1]
 26: (()+0x7e9a) [0x7f9d12ca3e9a]
 27: (clone()+0x6d) [0x7f9d1145c4bd]

Files

4026.tar.bz2 (4.64 MB) 4026.tar.bz2 leader's and affected peon's logs (~80MB decompressed) Joao Eduardo Luis, 02/05/2013 05:24 PM

Related issues 3 (0 open3 closed)

Related to Ceph - Feature #2611: mon: Single-PaxosResolvedJoao Eduardo Luis06/20/201207/09/2012

Actions
Related to Ceph - Bug #4040: mon: Single-Paxos: on PGMonitor, FAILED assert(0 == "update_from_paxos: error parsing incremental update")ResolvedJoao Eduardo Luis02/07/2013

Actions
Related to Ceph - Bug #4037: mon: Single-Paxos: on Paxos, FAILED assert(begin->last_committed == last_committed)ResolvedJoao Eduardo Luis02/06/2013

Actions
Actions #2

Updated by Joao Eduardo Luis about 11 years ago

Haven't been able to reproduce this nor to find an obvious cause for this to have happened.

After inspecting the store and comparing the versions within with those of the leader's store, nothing appeared to be wrong.

This appears to have happened when decoding the first byte on the bufferlist (corresponding to the log version? not sure), but we didn't have much debug infos on this function to pinpoint exactly where it happened; gdb wasn't much help either as it was complaining about being unable to resolve the overloaded instance (maybe lack of debug symbols on the gitbuilder build?).

Anyway, I've pushed a patch to make this function a bit more verbose and will be re-running this test in the off-chance of reproducing this bug.

Actions #3

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from New to In Progress
Actions #4

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF