Project

General

Profile

Actions

Bug #4040

closed

mon: Single-Paxos: on PGMonitor, FAILED assert(0 == "update_from_paxos: error parsing incremental update")

Added by Joao Eduardo Luis about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

INFO:teuthology.task.ceph.mon.f.err:mon/PGMonitor.cc: 182: FAILED assert(0 == "update_from_paxos: error parsing incremental update")
INFO:teuthology.task.ceph.mon.f.err: ceph version 0.56-489-g7bcacc4 (7bcacc45e0fca3460c87ac0800edf44382835685)
INFO:teuthology.task.ceph.mon.f.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x90d2b9]
INFO:teuthology.task.ceph.mon.f.err: 2: (PGMonitor::update_from_paxos()+0x978) [0x7ccde8]
INFO:teuthology.task.ceph.mon.f.err: 3: (Monitor::_ms_dispatch(Message*)+0x13eb) [0x709049]
INFO:teuthology.task.ceph.mon.f.err: 4: (Monitor::ms_dispatch(Message*)+0x38) [0x71f598]
INFO:teuthology.task.ceph.mon.f.err: 5: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x971dd5]
INFO:teuthology.task.ceph.mon.f.err: 6: (DispatchQueue::entry()+0x549) [0x971581]
INFO:teuthology.task.ceph.mon.f.err: 7: (DispatchQueue::DispatchThread::entry()+0x1c) [0x8f8a74]
INFO:teuthology.task.ceph.mon.f.err: 8: (Thread::_entry_func(void*)+0x23) [0x900971]
INFO:teuthology.task.ceph.mon.f.err: 9: (()+0x7e9a) [0x7f1788e7ae9a]
INFO:teuthology.task.ceph.mon.f.err: 10: (clone()+0x6d) [0x7f17876334bd]

This yaml is awesome to flush out issues.


Files

sp.thrash-mons.lots.wrkldgen.yaml (2.14 KB) sp.thrash-mons.lots.wrkldgen.yaml awesome yaml to flush out issues on the monitor Joao Eduardo Luis, 02/07/2013 04:01 AM

Related issues 2 (0 open2 closed)

Related to Ceph - Feature #2611: mon: Single-PaxosResolvedJoao Eduardo Luis06/20/201207/09/2012

Actions
Related to Ceph - Bug #4026: mon: Single-Paxos: abort on LogMonitor::update_from_paxosResolvedJoao Eduardo Luis02/05/2013

Actions
Actions #1

Updated by Joao Eduardo Luis about 11 years ago

Something got messed up when updating the 'last_committed' version on mon.f, which by the way has fallen some 10 versions behind the leader (mon.c):

mon.f@3(peon).pg v487 update_from_paxos  applying incremental 488
mon.f@3(peon).pg v487 update_from_paxos: error parsing incremental update: buffer::end_of_buffer
mon/PGMonitor.cc: In function 'virtual void PGMonitor::update_from_paxos()' thread 7f1783b20700 time 2013-02-07 03:48:42.048063
mon/PGMonitor.cc: 182: FAILED assert(0 == "update_from_paxos: error parsing incremental update")
$ tst mon.f/store.db get pgmap 488
(pgmap, 488) does not exist
$ tst mon.c/store.db get pgmap 488
(pgmap, 488)
0000 : 05 05 8c 00 00 00 e8 01 00 00 00 00 00 00 00 00 : ................
0010 : 00 00 02 00 00 00 07 00 00 00 02 02 28 00 00 00 : ............(...
0020 : 00 e8 e1 92 a3 0c 14 00 00 64 c0 d2 7e e9 2d 00 : .........d..~.-.
0030 : 00 84 21 c0 24 23 26 00 00 00 00 00 00 00 00 00 : ..!.$#&.........
0040 : 00 00 00 00 00 00 00 00 0c 00 00 00 02 02 28 00 : ..............(.
0050 : 00 00 00 e8 e1 92 a3 0c 14 00 00 64 c0 d2 7e e9 : ...........d..~.
0060 : 2d 00 00 84 21 c0 24 23 26 00 00 00 00 00 00 00 : -...!.$#&.......
0070 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
0080 : 00 00 00 00 00 00 33 33 73 3f 9a 99 59 3f 00 00 : ......33s?..Y?..
0090 : 00 00                                           : ..

$ tst mon.c/store.db get pgmap last_committed
(pgmap, last_committed)
0000 : f1 01 00 00 00 00 00 00                         : ........

$ tst mon.f/store.db get pgmap last_committed
(pgmap, last_committed)
0000 : e9 01 00 00 00 00 00 00                         : ........

$ python -c 'print "{c} {f}".format(c=0x01f1,f=0x01e9)'
497 489

$ tst mon.f/store.db get pgmap 487
(pgmap, 487)
0000 : 05 05 f0 00 00 00 e7 01 00 00 00 00 00 00 00 00 : ................
0010 : 00 00 04 00 00 00 03 00 00 00 02 02 28 00 00 00 : ............(...
0020 : 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 6d 72 1d 00 : .h....?..\..mr..
0030 : 00 0c 5e 42 64 76 22 00 00 00 00 00 00 00 00 00 : ..^Bdv".........
0040 : 00 00 00 00 00 00 00 00 05 00 00 00 02 02 28 00 : ..............(.
0050 : 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 6d 72 : ...h....?..\..mr
0060 : 1d 00 00 0c 5e 42 64 76 22 00 00 00 00 00 00 00 : ....^Bdv".......
0070 : 00 00 00 00 00 00 00 00 00 00 06 00 00 00 02 02 : ................
0080 : 28 00 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 : (....h....?..\..
0090 : 6d 72 1d 00 00 0c 5e 42 64 76 22 00 00 00 00 00 : mr....^Bdv".....
00a0 : 00 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 : ................
00b0 : 02 02 28 00 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c : ..(....h....?..\
00c0 : 88 96 6d 72 1d 00 00 0c 5e 42 64 76 22 00 00 00 : ..mr....^Bdv"...
00d0 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
00e0 : 00 00 00 00 00 00 00 00 00 00 33 33 73 3f 9a 99 : ..........33s?..
00f0 : 59 3f 00 00 00 00                               : Y?....

$ tst mon.c/store.db get pgmap 487
(pgmap, 487)
0000 : 05 05 f0 00 00 00 e7 01 00 00 00 00 00 00 00 00 : ................
0010 : 00 00 04 00 00 00 03 00 00 00 02 02 28 00 00 00 : ............(...
0020 : 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 6d 72 1d 00 : .h....?..\..mr..
0030 : 00 0c 5e 42 64 76 22 00 00 00 00 00 00 00 00 00 : ..^Bdv".........
0040 : 00 00 00 00 00 00 00 00 05 00 00 00 02 02 28 00 : ..............(.
0050 : 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 6d 72 : ...h....?..\..mr
0060 : 1d 00 00 0c 5e 42 64 76 22 00 00 00 00 00 00 00 : ....^Bdv".......
0070 : 00 00 00 00 00 00 00 00 00 00 06 00 00 00 02 02 : ................
0080 : 28 00 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 : (....h....?..\..
0090 : 6d 72 1d 00 00 0c 5e 42 64 76 22 00 00 00 00 00 : mr....^Bdv".....
00a0 : 00 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 : ................
00b0 : 02 02 28 00 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c : ..(....h....?..\
00c0 : 88 96 6d 72 1d 00 00 0c 5e 42 64 76 22 00 00 00 : ..mr....^Bdv"...
00d0 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
00e0 : 00 00 00 00 00 00 00 00 00 00 33 33 73 3f 9a 99 : ..........33s?..
00f0 : 59 3f 00 00 00 00                               : Y?....
Actions #2

Updated by Joao Eduardo Luis about 11 years ago

Also, I suspect this might be causing the same problem described on #4026

Actions #3

Updated by Joao Eduardo Luis about 11 years ago

Triggered again, same symptoms, and it appears as if the issue is a skipped version on the store:

from the original crash, on mon.f's store, note how there's no version 488:

pgmap:480
pgmap:481
pgmap:482
pgmap:483
pgmap:484
pgmap:485
pgmap:486
pgmap:487
pgmap:489

on the most recent crash, note how version 608 is missing (latest version 609):

pgmap:600
pgmap:601
pgmap:602
pgmap:603
pgmap:604
pgmap:605
pgmap:606
pgmap:607
pgmap:609

Actions #4

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF