Project

General

Profile

Actions

Bug #574

closed

timer: event cancellation apparently broken

Added by Sage Weil over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Just saw this on latest unstable, f8b3271f45cc4a87e3f3f212d22e3d34ff13da44

The monitor schedules a propose timer event, and then in Paxos::election_starting() calls cancel_events(). A few moments later the event goes off and (rightly) fails it's sanity checks...

2010-11-11 15:30:58.401605 7f23445ce710 mon.a@0(leader).paxosservice(pgmap)  setting propose timer with delay of 0.05
2010-11-11 15:30:58.408371 7f23445ce710 -- 10.0.1.252:6789/0 <== mon1 10.0.1.252:6790/0 2874 ==== election(propose 5) v1 ==== 481+0+0 (3459256886 0 0) 0x7f2338001cc0
2010-11-11 15:30:58.408406 7f23445ce710 mon.a@0(leader) e1 have connection
2010-11-11 15:30:58.408417 7f23445ce710 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon1 10.0.1.252:6790/0 is openallow * for mon1 10.0.1.252:6790/0
2010-11-11 15:30:58.408427 7f23445ce710 mon.a@0(leader) e1  caps allow *
2010-11-11 15:30:58.408472 7f23445ce710 mon.a@0(leader).elector(4) handle_propose from mon1
2010-11-11 15:30:58.408481 7f23445ce710 mon.a@0(leader).elector(4) bump_epoch 4 to 5
2010-11-11 15:30:58.408490 7f23445ce710 store(dev/mon.a) set_int election_epoch = 5
2010-11-11 15:30:58.559494 7f23445ce710 mon.a@0(leader) e1 call_election
2010-11-11 15:30:58.559515 7f23445ce710 mon.a@0(leader).elector(5) start -- can i be leader?
2010-11-11 15:30:58.559529 7f23445ce710 mon.a@0(leader) e1 starting_election 5
2010-11-11 15:30:58.559536 7f23445ce710 mon.a@0(starting).paxos(pgmap active lc 80) election_starting -- canceling timeouts
2010-11-11 15:30:58.559559 7f23445ce710 mon.a@0(starting).pg v80 _updated_stats for osd0 10.0.1.252:6802/28980
2010-11-11 15:30:58.559577 7f23445ce710 -- 10.0.1.252:6789/0 --> 10.0.1.252:6802/28980 -- pg_stats_ack(1 pgs) v1 -- ?+0 0x7f2338004920
2010-11-11 15:30:58.559621 7f23445ce710 mon.a@0(starting).paxos(mdsmap active lc 5) election_starting -- canceling timeouts
2010-11-11 15:30:58.559646 7f23445ce710 mon.a@0(starting).paxos(osdmap active lc 3) election_starting -- canceling timeouts
2010-11-11 15:30:58.559657 7f23445ce710 mon.a@0(starting).paxos(logm active lc 9) election_starting -- canceling timeouts
2010-11-11 15:30:58.559673 7f23445ce710 mon.a@0(starting).paxos(class active lc 2) election_starting -- canceling timeouts
2010-11-11 15:30:58.559685 7f23445ce710 mon.a@0(starting).paxos(monmap active lc 1) election_starting -- canceling timeouts
2010-11-11 15:30:58.559695 7f23445ce710 mon.a@0(starting).paxos(auth active lc 5) election_starting -- canceling timeouts
2010-11-11 15:30:58.559705 7f23445ce710 mon.a@0(starting).paxosservice(pgmap) election_starting
2010-11-11 15:30:58.559713 7f23445ce710 mon.a@0(starting).paxosservice(mdsmap) election_starting
2010-11-11 15:30:58.559721 7f23445ce710 mon.a@0(starting).paxosservice(osdmap) election_starting
2010-11-11 15:30:58.559727 7f23445ce710 mon.a@0(starting).paxosservice(logm) election_starting
2010-11-11 15:30:58.559732 7f23445ce710 mon.a@0(starting).paxosservice(class) election_starting
2010-11-11 15:30:58.559738 7f23445ce710 mon.a@0(starting).paxosservice(monmap) election_starting
2010-11-11 15:30:58.559743 7f23445ce710 mon.a@0(starting).paxosservice(auth) election_starting
2010-11-11 15:30:58.559771 7f23445ce710 -- 10.0.1.252:6789/0 --> mon1 10.0.1.252:6790/0 -- election(propose 5) v1 -- ?+0 0x7f2338003660
2010-11-11 15:30:58.559811 7f23445ce710 -- 10.0.1.252:6789/0 --> mon2 10.0.1.252:6791/0 -- election(propose 5) v1 -- ?+0 0x7f2338003860
2010-11-11 15:30:58.559886 7f2343dcd710 mon.a@0(starting).paxosservice(pgmap) propose_pending
2010-11-11 15:30:58.560511 7f23445ce710 -- 10.0.1.252:6789/0 <== mon1 10.0.1.252:6790/0 2875 ==== election(ack 5) v1 ==== 481+0+0 (581609027 0 0) 0x7f2338001210
mon/PaxosService.cc: In function 'void PaxosService::propose_pending()':
mon/PaxosService.cc:128: FAILED assert(mon->is_leader() && paxos->is_active())
 ceph version 0.24~rc (commit:f8b3271f45cc4a87e3f3f212d22e3d34ff13da44)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39) [0x71ec7e]
 2: (PaxosService::propose_pending()+0xcb) [0x621017]
 3: (PaxosService::C_Propose::finish(int)+0x2f) [0x621635]
 4: (TimerThread::entry()+0x397) [0x71cd71]
 5: (Thread::_entry_func(void*)+0x23) [0x5dffbb]
 6: (()+0x68ba) [0x7f2346af48ba]
 7: (clone()+0x6d) [0x7f2345abb01d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
*** Caught signal (ABRT) ***
 ceph version 0.24~rc (commit:f8b3271f45cc4a87e3f3f212d22e3d34ff13da44)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x71f085]
 2: (sigabrt_handler(int)+0x49) [0x72fe05]
 3: (()+0x321f0) [0x7f2345a1e1f0]
 4: (gsignal()+0x35) [0x7f2345a1e175]
 5: (abort()+0x180) [0x7f2345a20f80]
 6: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f23462b1dc5]
 7: (()+0xcb166) [0x7f23462b0166]
 8: (()+0xcb193) [0x7f23462b0193]
 9: (()+0xcb28e) [0x7f23462b028e]
 10: (ceph::__ceph_assert_warn(char const*, char const*, int, char const*)+0) [0x71eeb2]
 11: (PaxosService::propose_pending()+0xcb) [0x621017]
 12: (PaxosService::C_Propose::finish(int)+0x2f) [0x621635]
 13: (TimerThread::entry()+0x397) [0x71cd71]
 14: (Thread::_entry_func(void*)+0x23) [0x5dffbb]
 15: (()+0x68ba) [0x7f2346af48ba]
 16: (clone()+0x6d) [0x7f2345abb01d]
Actions

Also available in: Atom PDF