Project

General

Profile

Actions

Bug #10115

closed

mon not running. osd is dead

Added by ? ?? over 9 years ago. Updated about 9 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

my ceph did't config the cephx. i sloved one problem before as this issue said:http://tracker.ceph.com/issues/8851.

but now the ceph cluster can not start up.
my ceph cluster has three server. on each server,threre is one mon, one osd .one mds. after i reboot mon. only one mon,three mds are running,other is dead.

when i start ceph:service ceph start. it shows: any one can give me some suggestion?
root@compute1:~# service ceph restart === mon.b === === mon.b ===
Stopping Ceph mon.b on compute1...done === mon.b ===
Starting Ceph mon.b on compute1...
mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f89132347c0 time 2014-11-15 14:08:53.303106
mon/OSDMonitor.cc: 198: FAILED assert(err 0)
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
2: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
4: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
5: (Monitor::preinit()+0x69f) [0x7f89133c983f]
6: (main()+0x272d) [0x7f891339b90d]
7: (__libc_start_main()+0xed) [0x7f891122e78d]
8: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-11-15 14:08:53.303944 7f89132347c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f89132347c0 time 2014-11-15 14:08:53.303106
mon/OSDMonitor.cc: 198: FAILED assert(err 0)

ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
2: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
4: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
5: (Monitor::preinit()+0x69f) [0x7f89133c983f]
6: (main()+0x272d) [0x7f891339b90d]
7: (__libc_start_main()+0xed) [0x7f891122e78d]
8: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2014-11-15 14:08:53.303944 7f89132347c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f89132347c0 time 2014-11-15 14:08:53.303106
mon/OSDMonitor.cc: 198: FAILED assert(err 0)
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
2: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
4: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
5: (Monitor::preinit()+0x69f) [0x7f89133c983f]
6: (main()+0x272d) [0x7f891339b90d]
7: (__libc_start_main()+0xed) [0x7f891122e78d]
8: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
  • Caught signal (Aborted)
    in thread 7f89132347c0
    ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
    1: (()+0x48de6a) [0x7f89136d8e6a]
    2: (()+0xfcb0) [0x7f8912ba6cb0]
    3: (gsignal()+0x35) [0x7f89112434f5]
    4: (abort()+0x17b) [0x7f8911246c5b]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f8911b9669d]
    6: (()+0xb5846) [0x7f8911b94846]
    7: (()+0xb5873) [0x7f8911b94873]
    8: (()+0xb596e) [0x7f8911b9496e]
    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x7f89135d8d5f]
    10: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
    11: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
    12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
    13: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
    14: (Monitor::preinit()+0x69f) [0x7f89133c983f]
    15: (main()+0x272d) [0x7f891339b90d]
    16: (__libc_start_main()+0xed) [0x7f891122e78d]
    17: (()+0x153719) [0x7f891339e719]
    2014-11-15 14:08:53.311763 7f89132347c0 -1
    Caught signal (Aborted) *
    in thread 7f89132347c0
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (()+0x48de6a) [0x7f89136d8e6a]
2: (()+0xfcb0) [0x7f8912ba6cb0]
3: (gsignal()+0x35) [0x7f89112434f5]
4: (abort()+0x17b) [0x7f8911246c5b]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f8911b9669d]
6: (()+0xb5846) [0x7f8911b94846]
7: (()+0xb5873) [0x7f8911b94873]
8: (()+0xb596e) [0x7f8911b9496e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x7f89135d8d5f]
10: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
11: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
13: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
14: (Monitor::preinit()+0x69f) [0x7f89133c983f]
15: (main()+0x272d) [0x7f891339b90d]
16: (__libc_start_main()+0xed) [0x7f891122e78d]
17: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2014-11-15 14:08:53.311763 7f89132347c0 -1 ** Caught signal (Aborted) *
in thread 7f89132347c0
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (()+0x48de6a) [0x7f89136d8e6a]
2: (()+0xfcb0) [0x7f8912ba6cb0]
3: (gsignal()+0x35) [0x7f89112434f5]
4: (abort()+0x17b) [0x7f8911246c5b]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f8911b9669d]
6: (()+0xb5846) [0x7f8911b94846]
7: (()+0xb5873) [0x7f8911b94873]
8: (()+0xb596e) [0x7f8911b9496e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x7f89135d8d5f]
10: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
11: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
13: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
14: (Monitor::preinit()+0x69f) [0x7f89133c983f]
15: (main()+0x272d) [0x7f891339b90d]
16: (__libc_start_main()+0xed) [0x7f891122e78d]
17: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

[29271]: (33) Numerical argument out of domain
failed: 'ulimit -n 131072; /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c /etc/ceph/ceph.conf --cluster ceph '
Starting ceph-create-keys on compute1...
= mds.b === === mds.b ===
Stopping Ceph mds.b on compute1...kill 20412...done === mds.b ===
Starting Ceph mds.b on compute1...
starting mds.b at :/0 === osd.1 === === osd.1 ===
Stopping Ceph osd.1 on compute1...done === osd.1 ===
Mounting xfs on compute1:/var/lib/ceph/osd/ceph-1
2014-11-15 14:08:55.495896 7f3c9cb9f700 10 monclient(hunting): build_initial_monmap
2014-11-15 14:08:55.496169 7f3c9cb9f700 1 -- :/0 messenger.start
2014-11-15 14:08:55.496213 7f3c9cb9f700 10 monclient(hunting): init
2014-11-15 14:08:55.496252 7f3c9cb9f700 10 monclient(hunting): auth_supported 1 method none
2014-11-15 14:08:55.496364 7f3c9cb9f700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:08:55.496437 7f3c9cb9f700 10 monclient(hunting): picked mon.a con 0x7f3c980228d0 addr 10.110.13.2:6789/0
2014-11-15 14:08:55.496504 7f3c9cb9f700 10 monclient(hunting): _send_mon_message to mon.a at 10.110.13.2:6789/0
2014-11-15 14:08:55.496514 7f3c9cb9f700 1 -
:/1029895 --> 10.110.13.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c98022d50 con 0x7f3c980228d0
2014-11-15 14:08:55.496535 7f3c9cb9f700 10 monclient(hunting): renew_subs
2014-11-15 14:08:55.496543 7f3c9cb9f700 10 monclient(hunting): authenticate will time out at 2014-11-15 14:13:55.496542
2014-11-15 14:08:55.496731 7f3c9c164700 0 -- :/1029895 >> 10.110.13.2:6789/0 pipe(0x7f3c98022660 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c980228d0).fault
2014-11-15 14:08:58.496474 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:08:58.496539 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:08:58.496545 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:08:58.496562 7f3c95ae7700 1 -
:/1029895 mark_down 0x7f3c980228d0 -- 0x7f3c98022660
2014-11-15 14:08:58.496704 7f3c95ae7700 10 monclient(hunting): picked mon.c con 0x7f3c8c000e70 addr 10.110.13.4:6789/0
2014-11-15 14:08:58.496775 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.c at 10.110.13.4:6789/0
2014-11-15 14:08:58.496787 7f3c95ae7700 1 -- :/1029895 --> 10.110.13.4:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c0012f0 con 0x7f3c8c000e70
2014-11-15 14:08:58.496854 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:08:58.497257 7f3c94ae5700 1 -- 10.110.13.3:0/1029895 learned my addr 10.110.13.3:0/1029895
2014-11-15 14:09:01.496983 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:01.497009 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:01.497013 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:01.497024 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c000e70 -- 0x7f3c8c000c00
2014-11-15 14:09:01.497228 7f3c95ae7700 10 monclient(hunting): picked mon.b con 0x7f3c8c001f10 addr 10.110.13.3:6789/0
2014-11-15 14:09:01.497259 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.b at 10.110.13.3:6789/0
2014-11-15 14:09:01.497266 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.3:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c002280 con 0x7f3c8c001f10
2014-11-15 14:09:01.497324 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:01.497361 7f3c949e4700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.3:6789/0 pipe(0x7f3c8c001ca0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c001f10).fault
2014-11-15 14:09:04.497427 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:04.497465 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:04.497468 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:04.497484 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c001f10 -- 0x7f3c8c001ca0
2014-11-15 14:09:04.497589 7f3c95ae7700 10 monclient(hunting): picked mon.a con 0x7f3c8c002b10 addr 10.110.13.2:6789/0
2014-11-15 14:09:04.497631 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.a at 10.110.13.2:6789/0
2014-11-15 14:09:04.497638 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c002d40 con 0x7f3c8c002b10
2014-11-15 14:09:04.497736 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:04.497986 7f3c9c164700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.2:6789/0 pipe(0x7f3c8c0028a0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c002b10).fault
2014-11-15 14:09:07.497867 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:07.497905 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:07.497909 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:07.497924 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c002b10 -- 0x7f3c8c0028a0
2014-11-15 14:09:07.498022 7f3c95ae7700 10 monclient(hunting): picked mon.b con 0x7f3c8c0031f0 addr 10.110.13.3:6789/0
2014-11-15 14:09:07.498065 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.b at 10.110.13.3:6789/0
2014-11-15 14:09:07.498072 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.3:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c003420 con 0x7f3c8c0031f0
2014-11-15 14:09:07.498145 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:07.498170 7f3c949e4700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.3:6789/0 pipe(0x7f3c8c002f80 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c0031f0).fault
2014-11-15 14:09:10.498264 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:10.498298 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:10.498302 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:10.498317 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c0031f0 -- 0x7f3c8c002f80
2014-11-15 14:09:10.498451 7f3c95ae7700 10 monclient(hunting): picked mon.c con 0x7f3c8c003af0 addr 10.110.13.4:6789/0
2014-11-15 14:09:10.498488 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.c at 10.110.13.4:6789/0
2014-11-15 14:09:10.498494 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.4:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c003d20 con 0x7f3c8c003af0
2014-11-15 14:09:10.498568 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:13.498688 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:13.498726 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:13.498730 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:13.498745 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c003af0 -- 0x7f3c8c003880
2014-11-15 14:09:13.498905 7f3c95ae7700 10 monclient(hunting): picked mon.a con 0x7f3c8c004000 addr 10.110.13.2:6789/0
2014-11-15 14:09:13.498936 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.a at 10.110.13.2:6789/0
2014-11-15 14:09:13.498943 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c004230 con 0x7f3c8c004000
2014-11-15 14:09:13.499087 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:13.499215 7f3c94ae5700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.2:6789/0 pipe(0x7f3c8c003080 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c004000).fault
2014-11-15 14:09:16.499236 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:16.499264 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:16.499266 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:16.499277 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c004000 -- 0x7f3c8c003080
2014-11-15 14:09:16.499340 7f3c95ae7700 10 monclient(hunting): picked mon.c con 0x7f3c8c0049f0 addr 10.110.13.4:6789/0
2014-11-15 14:09:16.499359 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.c at 10.110.13.4:6789/0
2014-11-15 14:09:16.499365 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.4:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c004c20 con 0x7f3c8c0049f0
2014-11-15 14:09:16.499449 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:19.499552 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:19.499578 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:19.499580 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:19.499590 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c0049f0 -- 0x7f3c8c004780
2014-11-15 14:09:19.499680 7f3c95ae7700 10 monclient(hunting): picked mon.a con 0x7f3c8c004fb0 addr 10.110.13.2:6789/0
2014-11-15 14:09:19.499698 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.a at 10.110.13.2:6789/0
2014-11-15 14:09:19.499701 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c0051e0 con 0x7f3c8c004fb0
2014-11-15 14:09:19.499747 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:19.500068 7f3c9c164700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.2:6789/0 pipe(0x7f3c8c003080 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c004fb0).fault
2014-11-15 14:09:22.499853 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:22.499890 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:22.499893 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:22.499907 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c004fb0 -- 0x7f3c8c003080
2014-11-15 14:09:22.499992 7f3c95ae7700 10 monclient(hunting): picked mon.b con 0x7f3c8c005880 addr 10.110.13.3:6789/0
2014-11-15 14:09:22.500014 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.b at 10.110.13.3:6789/0
2014-11-15 14:09:22.500017 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.3:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c005af0 con 0x7f3c8c005880
2014-11-15 14:09:22.500028 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:22.500110 7f3c94ae5700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.3:6789/0 pipe(0x7f3c8c005610 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c005880).fault
failed: 'timeout 30 /usr/bin/ceph c /etc/ceph/ceph.conf --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move - 1 9.09 host=compute1 root=default'


Files

mds.b.log (8.18 KB) mds.b.log ? ??, 11/14/2014 10:22 PM
mon.b.log (158 KB) mon.b.log ? ??, 11/14/2014 10:22 PM
osd.1.log (9.26 KB) osd.1.log ? ??, 11/14/2014 10:22 PM

Updated by ? ?? over 9 years ago

this is the log file on one of my ceph node.

Actions #2

Updated by ? ?? over 9 years ago

my ceoh version is 0.80.1. i install them on ubuntu 12.04.4
uname -a : Linux controller 3.11.0-26-generic #45~precise1-Ubuntu SMP Tue Jul 15 04:02:35 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Actions #3

Updated by Sage Weil over 9 years ago

  • Assignee set to Joao Eduardo Luis
  • Source changed from other to Community (user)
Actions #4

Updated by Joao Eduardo Luis over 9 years ago

  • Category set to Monitor
  • Status changed from New to Need More Info

From the log it looks like there's (at least) an incremental missing (epoch 140).

This bug should not be related with the authmonitor bug (#8851), unless you performed some operation that could affect the osdmap keys (such as using ceph-kvstore-tool to change an 'osdmap' key).

It would be nice to have your monitor stores to test, along with full logs from both the crashing monitor and the leader.

Actions #5

Updated by Joao Eduardo Luis over 9 years ago

Also, ideally, logs from before the crash started to happen.

Actions #6

Updated by Sage Weil over 9 years ago

  • Priority changed from Urgent to High
Actions #7

Updated by Sage Weil about 9 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF