Project

General

Profile

Actions

Bug #4103

closed

mon: Single-Paxos: on MonitorDBStore, segfault during sync

Added by Joao Eduardo Luis about 11 years ago. Updated about 11 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

mon.n@13(probing) e1 sync_start entity( mon.16 10.214.133.21:6794/0 )
-- 10.214.133.21:6793/0 --> mon.16 10.214.133.21:6794/0 -- mon_sync( start ) v1 -- ?+0 0x2da3580
[snip]
-- 10.214.133.21:6793/0 <== mon.16 10.214.133.21:6794/0 4 ==== mon_sync( start_reply ) v1 ==== 174+0+0 (2623747790 0 0) 0x2da3580 con 0x29739a0
mon.n@13(synchronizing sync( requester state start )) e1 handle_sync mon_sync( start_reply ) v1
mon.n@13(synchronizing sync( requester state start )) e1 handle_sync_start_reply mon_sync( start_reply ) v1
mon.n@13(synchronizing sync( requester state start )) e1 sync_send_heartbeat mon.16 10.214.133.21:6794/0 reply(0)
-- 10.214.133.21:6793/0 --> mon.16 10.214.133.21:6794/0 -- mon_sync( heartbeat ) v1 -- ?+0 0x2f19600
mon.n@13(synchronizing sync( requester state start )) e1 sync_start_chunks provider(mon.16 10.214.133.21:6794/0)
-- 10.214.133.21:6793/0 --> mon.16 10.214.133.21:6794/0 -- mon_sync( start_chunks ) v1 -- ?+0 0x2f19340
-- 10.214.133.21:6793/0 <== mon.17 10.214.133.28:6794/0 3 ==== mon_probe(reply 14509864-b912-4ab5-9086-401cabaa3375 name p paxos( fc 308 lc 681 )) v4 ==== 3093+0+0 (1218857009 0 0) 0x2edd000 con 0x2c12840
mon.n@13(synchronizing sync( requester state chunks )) e1 handle_probe mon_probe(reply 14509864-b912-4ab5-9086-401cabaa3375 name p paxos( fc 308 lc 681 )) v4
mon.n@13(synchronizing sync( requester state chunks )) e1 handle_probe_reply mon.17 10.214.133.28:6794/0mon_probe(reply 14509864-b912-4ab5-9086-401cabaa3375 name p paxos( fc 308 lc 681 )) v4
mon.n@13(synchronizing sync( requester state chunks )) e1  monmap is e1: 21 mons at {a=10.214.133.28:6789/0,b=10.214.133.21:6789/0,c=10.214.132.37:6789/0,d=10.214.133.28:6790/0,e=10.214.133.21:6790/0,f=10.214.132.37:6790/0,g=10.214.133.28:6791/0,h=10.214.133.21:6791/0,i=10.214.132.37:6791/0,j=10.214.133.28:6792/0,k=10.214.133.21:6792/0,l=10.214.132.37:6792/0,m=10.214.133.28:6793/0,n=10.214.133.21:6793/0,o=10.214.132.37:6793/0,p=10.214.133.28:6794/0,q=10.214.133.21:6794/0,r=10.214.132.37:6794/0,s=10.214.133.28:6795/0,t=10.214.133.21:6795/0,u=10.214.132.37:6795/0}
-- 10.214.133.21:6793/0 <== mon.16 10.214.133.21:6794/0 5 ==== mon_sync( heartbeat_reply ) v1 ==== 174+0+0 (1026169709 0 0) 0x2f19340 con 0x29739a0
mon.n@13(synchronizing sync( requester state chunks )) e1 handle_sync mon_sync( heartbeat_reply ) v1
mon.n@13(synchronizing sync( requester state chunks )) e1 handle_sync_heartbeat_reply mon_sync( heartbeat_reply ) v1
-- 10.214.133.21:6793/0 <== mon.16 10.214.133.21:6794/0 6 ==== mon_sync( chunk v 681 flags( last ) ) v1 ==== 174+0+0 (8143836 0 0) 0x2f19600 con 0x29739a0
mon.n@13(synchronizing sync( requester state chunks )) e1 handle_sync mon_sync( chunk v 681 flags( last ) ) v1
mon.n@13(synchronizing sync( requester state chunks )) e1 handle_sync_chunk mon_sync( chunk v 681 flags( last ) ) v1
2013-02-12 08:53:37.859829 7f49d6723700 -1 *** Caught signal (Aborted) **
 in thread 7f49d6723700

 ceph version 0.56-784-g667289f (667289f812c39df87175cf4c6e9431ac678b1082)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x8481bf]
 2: /tmp/cephtest/joao@tardis-2013-02-12_16-32-19/binary/usr/local/bin/ceph-mon() [0x847926]
 3: (()+0xfcb0) [0x7f49db284cb0]
 4: (gsignal()+0x35) [0x7f49d9979445]
 5: (abort()+0x17b) [0x7f49d997cbab]
 6: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f49da2c769d]
 7: (()+0xb5846) [0x7f49da2c5846]
 8: (()+0xb5873) [0x7f49da2c5873]
 9: (()+0xb596e) [0x7f49da2c596e]
 10: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0xc8) [0x916916]
 11: (void decode_raw<unsigned char>(unsigned char&, ceph::buffer::list::iterator&)+0x25) [0x729b59]
 12: (decode(unsigned char&, ceph::buffer::list::iterator&)+0x23) [0x7193b6]
 13: (MonitorDBStore::Transaction::decode(ceph::buffer::list::iterator&)+0x24) [0x71fdf6]
 14: (MonitorDBStore::Transaction::append_from_encoded(ceph::buffer::list&)+0x43) [0x71ffd5]
 15: (Monitor::handle_sync_chunk(MMonSync*)+0x44c) [0x6fd722]
 16: (Monitor::handle_sync(MMonSync*)+0x1fa) [0x6ff3c8]
 17: (Monitor::_ms_dispatch(Message*)+0xf6d) [0x70bf45]
 18: (Monitor::ms_dispatch(Message*)+0x38) [0x723c1c]
 19: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x97a72d]
 20: (DispatchQueue::entry()+0x549) [0x979ed9]
 21: (DispatchQueue::DispatchThread::entry()+0x1c) [0x9008ae]
 22: (Thread::_entry_func(void*)+0x23) [0x9087ed]
 23: (()+0x7e9a) [0x7f49db27ce9a]
 24: (clone()+0x6d) [0x7f49d9a354bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 2 (0 open2 closed)

Is duplicate of Ceph - Bug #4162: mon: Single-Paxos: on sync, corrupted paxos storeResolvedJoao Eduardo Luis02/16/2013

Actions
Is duplicate of Ceph - Feature #2611: mon: Single-PaxosResolvedJoao Eduardo Luis06/20/201207/09/2012

Actions
Actions #1

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from In Progress to Duplicate

This bug was just a symptom of the same cause that popped in #4162, where we were able to pinpoint with much more accuracy the real issue.

Actions

Also available in: Atom PDF