Bug #4103
mon: Single-Paxos: on MonitorDBStore, segfault during sync
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
mon.n@13(probing) e1 sync_start entity( mon.16 10.214.133.21:6794/0 ) -- 10.214.133.21:6793/0 --> mon.16 10.214.133.21:6794/0 -- mon_sync( start ) v1 -- ?+0 0x2da3580 [snip] -- 10.214.133.21:6793/0 <== mon.16 10.214.133.21:6794/0 4 ==== mon_sync( start_reply ) v1 ==== 174+0+0 (2623747790 0 0) 0x2da3580 con 0x29739a0 mon.n@13(synchronizing sync( requester state start )) e1 handle_sync mon_sync( start_reply ) v1 mon.n@13(synchronizing sync( requester state start )) e1 handle_sync_start_reply mon_sync( start_reply ) v1 mon.n@13(synchronizing sync( requester state start )) e1 sync_send_heartbeat mon.16 10.214.133.21:6794/0 reply(0) -- 10.214.133.21:6793/0 --> mon.16 10.214.133.21:6794/0 -- mon_sync( heartbeat ) v1 -- ?+0 0x2f19600 mon.n@13(synchronizing sync( requester state start )) e1 sync_start_chunks provider(mon.16 10.214.133.21:6794/0) -- 10.214.133.21:6793/0 --> mon.16 10.214.133.21:6794/0 -- mon_sync( start_chunks ) v1 -- ?+0 0x2f19340 -- 10.214.133.21:6793/0 <== mon.17 10.214.133.28:6794/0 3 ==== mon_probe(reply 14509864-b912-4ab5-9086-401cabaa3375 name p paxos( fc 308 lc 681 )) v4 ==== 3093+0+0 (1218857009 0 0) 0x2edd000 con 0x2c12840 mon.n@13(synchronizing sync( requester state chunks )) e1 handle_probe mon_probe(reply 14509864-b912-4ab5-9086-401cabaa3375 name p paxos( fc 308 lc 681 )) v4 mon.n@13(synchronizing sync( requester state chunks )) e1 handle_probe_reply mon.17 10.214.133.28:6794/0mon_probe(reply 14509864-b912-4ab5-9086-401cabaa3375 name p paxos( fc 308 lc 681 )) v4 mon.n@13(synchronizing sync( requester state chunks )) e1 monmap is e1: 21 mons at {a=10.214.133.28:6789/0,b=10.214.133.21:6789/0,c=10.214.132.37:6789/0,d=10.214.133.28:6790/0,e=10.214.133.21:6790/0,f=10.214.132.37:6790/0,g=10.214.133.28:6791/0,h=10.214.133.21:6791/0,i=10.214.132.37:6791/0,j=10.214.133.28:6792/0,k=10.214.133.21:6792/0,l=10.214.132.37:6792/0,m=10.214.133.28:6793/0,n=10.214.133.21:6793/0,o=10.214.132.37:6793/0,p=10.214.133.28:6794/0,q=10.214.133.21:6794/0,r=10.214.132.37:6794/0,s=10.214.133.28:6795/0,t=10.214.133.21:6795/0,u=10.214.132.37:6795/0} -- 10.214.133.21:6793/0 <== mon.16 10.214.133.21:6794/0 5 ==== mon_sync( heartbeat_reply ) v1 ==== 174+0+0 (1026169709 0 0) 0x2f19340 con 0x29739a0 mon.n@13(synchronizing sync( requester state chunks )) e1 handle_sync mon_sync( heartbeat_reply ) v1 mon.n@13(synchronizing sync( requester state chunks )) e1 handle_sync_heartbeat_reply mon_sync( heartbeat_reply ) v1 -- 10.214.133.21:6793/0 <== mon.16 10.214.133.21:6794/0 6 ==== mon_sync( chunk v 681 flags( last ) ) v1 ==== 174+0+0 (8143836 0 0) 0x2f19600 con 0x29739a0 mon.n@13(synchronizing sync( requester state chunks )) e1 handle_sync mon_sync( chunk v 681 flags( last ) ) v1 mon.n@13(synchronizing sync( requester state chunks )) e1 handle_sync_chunk mon_sync( chunk v 681 flags( last ) ) v1 2013-02-12 08:53:37.859829 7f49d6723700 -1 *** Caught signal (Aborted) ** in thread 7f49d6723700 ceph version 0.56-784-g667289f (667289f812c39df87175cf4c6e9431ac678b1082) 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x8481bf] 2: /tmp/cephtest/joao@tardis-2013-02-12_16-32-19/binary/usr/local/bin/ceph-mon() [0x847926] 3: (()+0xfcb0) [0x7f49db284cb0] 4: (gsignal()+0x35) [0x7f49d9979445] 5: (abort()+0x17b) [0x7f49d997cbab] 6: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f49da2c769d] 7: (()+0xb5846) [0x7f49da2c5846] 8: (()+0xb5873) [0x7f49da2c5873] 9: (()+0xb596e) [0x7f49da2c596e] 10: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0xc8) [0x916916] 11: (void decode_raw<unsigned char>(unsigned char&, ceph::buffer::list::iterator&)+0x25) [0x729b59] 12: (decode(unsigned char&, ceph::buffer::list::iterator&)+0x23) [0x7193b6] 13: (MonitorDBStore::Transaction::decode(ceph::buffer::list::iterator&)+0x24) [0x71fdf6] 14: (MonitorDBStore::Transaction::append_from_encoded(ceph::buffer::list&)+0x43) [0x71ffd5] 15: (Monitor::handle_sync_chunk(MMonSync*)+0x44c) [0x6fd722] 16: (Monitor::handle_sync(MMonSync*)+0x1fa) [0x6ff3c8] 17: (Monitor::_ms_dispatch(Message*)+0xf6d) [0x70bf45] 18: (Monitor::ms_dispatch(Message*)+0x38) [0x723c1c] 19: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x97a72d] 20: (DispatchQueue::entry()+0x549) [0x979ed9] 21: (DispatchQueue::DispatchThread::entry()+0x1c) [0x9008ae] 22: (Thread::_entry_func(void*)+0x23) [0x9087ed] 23: (()+0x7e9a) [0x7f49db27ce9a] 24: (clone()+0x6d) [0x7f49d9a354bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Related issues
History
#1 Updated by Joao Eduardo Luis about 10 years ago
- Status changed from In Progress to Duplicate
This bug was just a symptom of the same cause that popped in #4162, where we were able to pinpoint with much more accuracy the real issue.