Bug #4040
closed
mon: Single-Paxos: on PGMonitor, FAILED assert(0 == "update_from_paxos: error parsing incremental update")
Added by Joao Eduardo Luis about 11 years ago.
Updated about 11 years ago.
Description
INFO:teuthology.task.ceph.mon.f.err:mon/PGMonitor.cc: 182: FAILED assert(0 == "update_from_paxos: error parsing incremental update")
INFO:teuthology.task.ceph.mon.f.err: ceph version 0.56-489-g7bcacc4 (7bcacc45e0fca3460c87ac0800edf44382835685)
INFO:teuthology.task.ceph.mon.f.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x90d2b9]
INFO:teuthology.task.ceph.mon.f.err: 2: (PGMonitor::update_from_paxos()+0x978) [0x7ccde8]
INFO:teuthology.task.ceph.mon.f.err: 3: (Monitor::_ms_dispatch(Message*)+0x13eb) [0x709049]
INFO:teuthology.task.ceph.mon.f.err: 4: (Monitor::ms_dispatch(Message*)+0x38) [0x71f598]
INFO:teuthology.task.ceph.mon.f.err: 5: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x971dd5]
INFO:teuthology.task.ceph.mon.f.err: 6: (DispatchQueue::entry()+0x549) [0x971581]
INFO:teuthology.task.ceph.mon.f.err: 7: (DispatchQueue::DispatchThread::entry()+0x1c) [0x8f8a74]
INFO:teuthology.task.ceph.mon.f.err: 8: (Thread::_entry_func(void*)+0x23) [0x900971]
INFO:teuthology.task.ceph.mon.f.err: 9: (()+0x7e9a) [0x7f1788e7ae9a]
INFO:teuthology.task.ceph.mon.f.err: 10: (clone()+0x6d) [0x7f17876334bd]
This yaml is awesome to flush out issues.
Files
Something got messed up when updating the 'last_committed' version on mon.f, which by the way has fallen some 10 versions behind the leader (mon.c):
mon.f@3(peon).pg v487 update_from_paxos applying incremental 488
mon.f@3(peon).pg v487 update_from_paxos: error parsing incremental update: buffer::end_of_buffer
mon/PGMonitor.cc: In function 'virtual void PGMonitor::update_from_paxos()' thread 7f1783b20700 time 2013-02-07 03:48:42.048063
mon/PGMonitor.cc: 182: FAILED assert(0 == "update_from_paxos: error parsing incremental update")
$ tst mon.f/store.db get pgmap 488
(pgmap, 488) does not exist
$ tst mon.c/store.db get pgmap 488
(pgmap, 488)
0000 : 05 05 8c 00 00 00 e8 01 00 00 00 00 00 00 00 00 : ................
0010 : 00 00 02 00 00 00 07 00 00 00 02 02 28 00 00 00 : ............(...
0020 : 00 e8 e1 92 a3 0c 14 00 00 64 c0 d2 7e e9 2d 00 : .........d..~.-.
0030 : 00 84 21 c0 24 23 26 00 00 00 00 00 00 00 00 00 : ..!.$#&.........
0040 : 00 00 00 00 00 00 00 00 0c 00 00 00 02 02 28 00 : ..............(.
0050 : 00 00 00 e8 e1 92 a3 0c 14 00 00 64 c0 d2 7e e9 : ...........d..~.
0060 : 2d 00 00 84 21 c0 24 23 26 00 00 00 00 00 00 00 : -...!.$#&.......
0070 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
0080 : 00 00 00 00 00 00 33 33 73 3f 9a 99 59 3f 00 00 : ......33s?..Y?..
0090 : 00 00 : ..
$ tst mon.c/store.db get pgmap last_committed
(pgmap, last_committed)
0000 : f1 01 00 00 00 00 00 00 : ........
$ tst mon.f/store.db get pgmap last_committed
(pgmap, last_committed)
0000 : e9 01 00 00 00 00 00 00 : ........
$ python -c 'print "{c} {f}".format(c=0x01f1,f=0x01e9)'
497 489
$ tst mon.f/store.db get pgmap 487
(pgmap, 487)
0000 : 05 05 f0 00 00 00 e7 01 00 00 00 00 00 00 00 00 : ................
0010 : 00 00 04 00 00 00 03 00 00 00 02 02 28 00 00 00 : ............(...
0020 : 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 6d 72 1d 00 : .h....?..\..mr..
0030 : 00 0c 5e 42 64 76 22 00 00 00 00 00 00 00 00 00 : ..^Bdv".........
0040 : 00 00 00 00 00 00 00 00 05 00 00 00 02 02 28 00 : ..............(.
0050 : 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 6d 72 : ...h....?..\..mr
0060 : 1d 00 00 0c 5e 42 64 76 22 00 00 00 00 00 00 00 : ....^Bdv".......
0070 : 00 00 00 00 00 00 00 00 00 00 06 00 00 00 02 02 : ................
0080 : 28 00 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 : (....h....?..\..
0090 : 6d 72 1d 00 00 0c 5e 42 64 76 22 00 00 00 00 00 : mr....^Bdv".....
00a0 : 00 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 : ................
00b0 : 02 02 28 00 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c : ..(....h....?..\
00c0 : 88 96 6d 72 1d 00 00 0c 5e 42 64 76 22 00 00 00 : ..mr....^Bdv"...
00d0 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
00e0 : 00 00 00 00 00 00 00 00 00 00 33 33 73 3f 9a 99 : ..........33s?..
00f0 : 59 3f 00 00 00 00 : Y?....
$ tst mon.c/store.db get pgmap 487
(pgmap, 487)
0000 : 05 05 f0 00 00 00 e7 01 00 00 00 00 00 00 00 00 : ................
0010 : 00 00 04 00 00 00 03 00 00 00 02 02 28 00 00 00 : ............(...
0020 : 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 6d 72 1d 00 : .h....?..\..mr..
0030 : 00 0c 5e 42 64 76 22 00 00 00 00 00 00 00 00 00 : ..^Bdv".........
0040 : 00 00 00 00 00 00 00 00 05 00 00 00 02 02 28 00 : ..............(.
0050 : 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 6d 72 : ...h....?..\..mr
0060 : 1d 00 00 0c 5e 42 64 76 22 00 00 00 00 00 00 00 : ....^Bdv".......
0070 : 00 00 00 00 00 00 00 00 00 00 06 00 00 00 02 02 : ................
0080 : 28 00 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c 88 96 : (....h....?..\..
0090 : 6d 72 1d 00 00 0c 5e 42 64 76 22 00 00 00 00 00 : mr....^Bdv".....
00a0 : 00 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 : ................
00b0 : 02 02 28 00 00 00 00 68 e6 d8 d1 e8 3f 00 00 5c : ..(....h....?..\
00c0 : 88 96 6d 72 1d 00 00 0c 5e 42 64 76 22 00 00 00 : ..mr....^Bdv"...
00d0 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
00e0 : 00 00 00 00 00 00 00 00 00 00 33 33 73 3f 9a 99 : ..........33s?..
00f0 : 59 3f 00 00 00 00 : Y?....
Also, I suspect this might be causing the same problem described on #4026
Triggered again, same symptoms, and it appears as if the issue is a skipped version on the store:
from the original crash, on mon.f's store, note how there's no version 488:
pgmap:480
pgmap:481
pgmap:482
pgmap:483
pgmap:484
pgmap:485
pgmap:486
pgmap:487
pgmap:489
on the most recent crash, note how version 608 is missing (latest version 609):
pgmap:600
pgmap:601
pgmap:602
pgmap:603
pgmap:604
pgmap:605
pgmap:606
pgmap:607
pgmap:609
- Status changed from New to Resolved
Also available in: Atom
PDF