Project

General

Profile

Actions

Bug #6041

closed

Failing to add 3rd monitor

Added by Bram Pieters over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After adding an additional (3rd) monitor, that new monitor will crash during first sync.

Ceph version: 0.64

2013-08-16 13:03:49.113593 7ff246803780 0 ceph version 0.64 (42e06c12db63bae292acc074548c06478fa92ea2), process ceph-mon, pid 23753
2013-08-16 13:03:53.699706 7ff246803780 0 mon.1 does not exist in monmap, will attempt to join an existing cluster
2013-08-16 13:03:53.700401 7ff246803780 1 mon.1@-1(probing) e0 preinit fsid 527ae0c2-4d1d-4262-8a70-2ef36b41f63d
2013-08-16 13:03:58.595346 7ff1b653d700 0 mon.1@-1(probing) e7 my rank is now 0 (was 1)
2013-08-16 13:03:58.596415 7ff2233e6700 0 -
192.168.135.200:6789/0 >> 192.168.135.201:6789/0 pipe(0x28d0000 sd=24 :6789 s=0 pgs=0 cs=0 l=0).accept connect_seq 2 vs existing 0 state connecting
2013-08-16 13:03:58.596509 7ff2233e6700 0 -- 192.168.135.200:6789/0 >> 192.168.135.201:6789/0 pipe(0x28d0000 sd=24 :6789 s=0 pgs=0 cs=0 l=0).accept we reset (peer sent cseq 2, 0x2b0e780.cseq = 0), sendin
g RESETSESSION
2013-08-16 13:03:58.596927 7ff2233e6700 0 -- 192.168.135.200:6789/0 >> 192.168.135.201:6789/0 pipe(0x28d0000 sd=24 :6789 s=0 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 0 state connecting
2013-08-16 13:04:58.605749 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 72% total 19518392 used 5412656 avail 14105736
2013-08-16 13:05:58.620107 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 70% total 19518392 used 5818504 avail 13699888
2013-08-16 13:06:58.699185 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 67% total 19518392 used 6314288 avail 13204104
2013-08-16 13:07:41.480078 7ff1b6d3e700 1 mon.1@0(synchronizing sync( requester state chunks )) e7 discarding message auth(proto 0 30 bytes epoch 0) v1 and sending client elsewhere
2013-08-16 13:07:58.740044 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 65% total 19518392 used 6774000 avail 12744392
2013-08-16 13:08:58.836225 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 63% total 19518392 used 7209132 avail 12309260
2013-08-16 13:09:07.121687 7ff1b6d3e700 1 mon.1@0(synchronizing sync( requester state chunks )) e7 discarding message auth(proto 0 30 bytes epoch 0) v1 and sending client elsewhere
2013-08-16 13:09:58.836459 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 60% total 19518392 used 7619188 avail 11899204
2013-08-16 13:10:58.912055 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 58% total 19518392 used 8024784 avail 11493608
2013-08-16 13:11:58.947890 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 57% total 19518392 used 8239480 avail 11278912
2013-08-16 13:13:11.413905 7ff1b6d3e700 1 mon.1@0(synchronizing sync( requester state chunks )) e7 sync_timeout mon.2 192.168.135.202:6789/0
2013-08-16 13:13:11.413967 7ff1b6d3e700 0 mon.1@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 61% total 19518392 used 7480576 avail 12037816
2013-08-16 13:13:11.414004 7ff1b653d700 1 mon.1@0(synchronizing sync( requester state chunks )) e7 handle_sync_chunk stray message -- drop it.
2013-08-16 13:16:08.445530 7ff1b6d3e700 1 mon.1@0(synchronizing sync( requester state chunks )) e7 sync_requester_abort no longer a sync requester
2013-08-16 13:16:08.445760 7ff1b653d700 1 mon.1@0(probing) e7 handle_sync_chunk stray message -- drop it.
2013-08-16 13:18:00.180372 7ff1b6d3e700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7ff1b6d3e700 time 2013-08-16 13:18:00.138301
mon/Monitor.cc: 1171: FAILED assert(0 == "We should never reach this")

ceph version 0.64 (42e06c12db63bae292acc074548c06478fa92ea2)
1: (Monitor::sync_timeout(entity_inst_t&)+0xa67) [0x4b5797]
2: (Context::complete(int)+0xa) [0x4c384a]
3: (SafeTimer::timer_thread()+0x453) [0x5a79d3]
4: (SafeTimerThread::entry()+0xd) [0x5a9b9d]
5: (()+0x68ca) [0x7ff2463e88ca]
6: (clone()+0x6d) [0x7ff244a1fb6d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Files

ceph-mon.1.log (1.93 MB) ceph-mon.1.log Mon log file Bram Pieters, 08/17/2013 12:00 AM
Actions #1

Updated by Sage Weil over 10 years ago

  • Status changed from New to Resolved

Please upgrade to 0.67(.1) dumpling; 0.64 is an interim development release that doesn't get backported fixes (as 0.61 cuttlefish does). Note that all of this code is rewritten in 0.67 and is much more reliable. thanks!

Actions

Also available in: Atom PDF