Project

General

Profile

Actions

Bug #10115

closed

mon not running. osd is dead

Added by ? ?? over 9 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
High
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

my ceph did't config the cephx. i sloved one problem before as this issue said:http://tracker.ceph.com/issues/8851.

but now the ceph cluster can not start up.
my ceph cluster has three server. on each server,threre is one mon, one osd .one mds. after i reboot mon. only one mon,three mds are running,other is dead.

when i start ceph:service ceph start. it shows: any one can give me some suggestion?
root@compute1:~# service ceph restart === mon.b === === mon.b ===
Stopping Ceph mon.b on compute1...done === mon.b ===
Starting Ceph mon.b on compute1...
mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f89132347c0 time 2014-11-15 14:08:53.303106
mon/OSDMonitor.cc: 198: FAILED assert(err 0)
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
2: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
4: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
5: (Monitor::preinit()+0x69f) [0x7f89133c983f]
6: (main()+0x272d) [0x7f891339b90d]
7: (__libc_start_main()+0xed) [0x7f891122e78d]
8: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-11-15 14:08:53.303944 7f89132347c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f89132347c0 time 2014-11-15 14:08:53.303106
mon/OSDMonitor.cc: 198: FAILED assert(err 0)

ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
2: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
4: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
5: (Monitor::preinit()+0x69f) [0x7f89133c983f]
6: (main()+0x272d) [0x7f891339b90d]
7: (__libc_start_main()+0xed) [0x7f891122e78d]
8: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2014-11-15 14:08:53.303944 7f89132347c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f89132347c0 time 2014-11-15 14:08:53.303106
mon/OSDMonitor.cc: 198: FAILED assert(err 0)
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
2: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
4: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
5: (Monitor::preinit()+0x69f) [0x7f89133c983f]
6: (main()+0x272d) [0x7f891339b90d]
7: (__libc_start_main()+0xed) [0x7f891122e78d]
8: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
  • Caught signal (Aborted)
    in thread 7f89132347c0
    ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
    1: (()+0x48de6a) [0x7f89136d8e6a]
    2: (()+0xfcb0) [0x7f8912ba6cb0]
    3: (gsignal()+0x35) [0x7f89112434f5]
    4: (abort()+0x17b) [0x7f8911246c5b]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f8911b9669d]
    6: (()+0xb5846) [0x7f8911b94846]
    7: (()+0xb5873) [0x7f8911b94873]
    8: (()+0xb596e) [0x7f8911b9496e]
    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x7f89135d8d5f]
    10: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
    11: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
    12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
    13: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
    14: (Monitor::preinit()+0x69f) [0x7f89133c983f]
    15: (main()+0x272d) [0x7f891339b90d]
    16: (__libc_start_main()+0xed) [0x7f891122e78d]
    17: (()+0x153719) [0x7f891339e719]
    2014-11-15 14:08:53.311763 7f89132347c0 -1
    Caught signal (Aborted) *
    in thread 7f89132347c0
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (()+0x48de6a) [0x7f89136d8e6a]
2: (()+0xfcb0) [0x7f8912ba6cb0]
3: (gsignal()+0x35) [0x7f89112434f5]
4: (abort()+0x17b) [0x7f8911246c5b]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f8911b9669d]
6: (()+0xb5846) [0x7f8911b94846]
7: (()+0xb5873) [0x7f8911b94873]
8: (()+0xb596e) [0x7f8911b9496e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x7f89135d8d5f]
10: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
11: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
13: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
14: (Monitor::preinit()+0x69f) [0x7f89133c983f]
15: (main()+0x272d) [0x7f891339b90d]
16: (__libc_start_main()+0xed) [0x7f891122e78d]
17: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2014-11-15 14:08:53.311763 7f89132347c0 -1 ** Caught signal (Aborted) *
in thread 7f89132347c0
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (()+0x48de6a) [0x7f89136d8e6a]
2: (()+0xfcb0) [0x7f8912ba6cb0]
3: (gsignal()+0x35) [0x7f89112434f5]
4: (abort()+0x17b) [0x7f8911246c5b]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f8911b9669d]
6: (()+0xb5846) [0x7f8911b94846]
7: (()+0xb5873) [0x7f8911b94873]
8: (()+0xb596e) [0x7f8911b9496e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x7f89135d8d5f]
10: (OSDMonitor::update_from_paxos(bool*)+0x2b8b) [0x7f891342d1bb]
11: (PaxosService::refresh(bool*)+0x445) [0x7f89134174e5]
12: (Monitor::refresh_from_paxos(bool*)+0x57) [0x7f89133b1267]
13: (Monitor::init_paxos()+0xf5) [0x7f89133b1435]
14: (Monitor::preinit()+0x69f) [0x7f89133c983f]
15: (main()+0x272d) [0x7f891339b90d]
16: (__libc_start_main()+0xed) [0x7f891122e78d]
17: (()+0x153719) [0x7f891339e719]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

[29271]: (33) Numerical argument out of domain
failed: 'ulimit -n 131072; /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c /etc/ceph/ceph.conf --cluster ceph '
Starting ceph-create-keys on compute1...
= mds.b === === mds.b ===
Stopping Ceph mds.b on compute1...kill 20412...done === mds.b ===
Starting Ceph mds.b on compute1...
starting mds.b at :/0 === osd.1 === === osd.1 ===
Stopping Ceph osd.1 on compute1...done === osd.1 ===
Mounting xfs on compute1:/var/lib/ceph/osd/ceph-1
2014-11-15 14:08:55.495896 7f3c9cb9f700 10 monclient(hunting): build_initial_monmap
2014-11-15 14:08:55.496169 7f3c9cb9f700 1 -- :/0 messenger.start
2014-11-15 14:08:55.496213 7f3c9cb9f700 10 monclient(hunting): init
2014-11-15 14:08:55.496252 7f3c9cb9f700 10 monclient(hunting): auth_supported 1 method none
2014-11-15 14:08:55.496364 7f3c9cb9f700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:08:55.496437 7f3c9cb9f700 10 monclient(hunting): picked mon.a con 0x7f3c980228d0 addr 10.110.13.2:6789/0
2014-11-15 14:08:55.496504 7f3c9cb9f700 10 monclient(hunting): _send_mon_message to mon.a at 10.110.13.2:6789/0
2014-11-15 14:08:55.496514 7f3c9cb9f700 1 -
:/1029895 --> 10.110.13.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c98022d50 con 0x7f3c980228d0
2014-11-15 14:08:55.496535 7f3c9cb9f700 10 monclient(hunting): renew_subs
2014-11-15 14:08:55.496543 7f3c9cb9f700 10 monclient(hunting): authenticate will time out at 2014-11-15 14:13:55.496542
2014-11-15 14:08:55.496731 7f3c9c164700 0 -- :/1029895 >> 10.110.13.2:6789/0 pipe(0x7f3c98022660 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c980228d0).fault
2014-11-15 14:08:58.496474 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:08:58.496539 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:08:58.496545 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:08:58.496562 7f3c95ae7700 1 -
:/1029895 mark_down 0x7f3c980228d0 -- 0x7f3c98022660
2014-11-15 14:08:58.496704 7f3c95ae7700 10 monclient(hunting): picked mon.c con 0x7f3c8c000e70 addr 10.110.13.4:6789/0
2014-11-15 14:08:58.496775 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.c at 10.110.13.4:6789/0
2014-11-15 14:08:58.496787 7f3c95ae7700 1 -- :/1029895 --> 10.110.13.4:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c0012f0 con 0x7f3c8c000e70
2014-11-15 14:08:58.496854 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:08:58.497257 7f3c94ae5700 1 -- 10.110.13.3:0/1029895 learned my addr 10.110.13.3:0/1029895
2014-11-15 14:09:01.496983 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:01.497009 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:01.497013 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:01.497024 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c000e70 -- 0x7f3c8c000c00
2014-11-15 14:09:01.497228 7f3c95ae7700 10 monclient(hunting): picked mon.b con 0x7f3c8c001f10 addr 10.110.13.3:6789/0
2014-11-15 14:09:01.497259 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.b at 10.110.13.3:6789/0
2014-11-15 14:09:01.497266 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.3:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c002280 con 0x7f3c8c001f10
2014-11-15 14:09:01.497324 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:01.497361 7f3c949e4700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.3:6789/0 pipe(0x7f3c8c001ca0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c001f10).fault
2014-11-15 14:09:04.497427 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:04.497465 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:04.497468 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:04.497484 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c001f10 -- 0x7f3c8c001ca0
2014-11-15 14:09:04.497589 7f3c95ae7700 10 monclient(hunting): picked mon.a con 0x7f3c8c002b10 addr 10.110.13.2:6789/0
2014-11-15 14:09:04.497631 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.a at 10.110.13.2:6789/0
2014-11-15 14:09:04.497638 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c002d40 con 0x7f3c8c002b10
2014-11-15 14:09:04.497736 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:04.497986 7f3c9c164700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.2:6789/0 pipe(0x7f3c8c0028a0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c002b10).fault
2014-11-15 14:09:07.497867 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:07.497905 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:07.497909 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:07.497924 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c002b10 -- 0x7f3c8c0028a0
2014-11-15 14:09:07.498022 7f3c95ae7700 10 monclient(hunting): picked mon.b con 0x7f3c8c0031f0 addr 10.110.13.3:6789/0
2014-11-15 14:09:07.498065 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.b at 10.110.13.3:6789/0
2014-11-15 14:09:07.498072 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.3:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c003420 con 0x7f3c8c0031f0
2014-11-15 14:09:07.498145 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:07.498170 7f3c949e4700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.3:6789/0 pipe(0x7f3c8c002f80 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c0031f0).fault
2014-11-15 14:09:10.498264 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:10.498298 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:10.498302 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:10.498317 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c0031f0 -- 0x7f3c8c002f80
2014-11-15 14:09:10.498451 7f3c95ae7700 10 monclient(hunting): picked mon.c con 0x7f3c8c003af0 addr 10.110.13.4:6789/0
2014-11-15 14:09:10.498488 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.c at 10.110.13.4:6789/0
2014-11-15 14:09:10.498494 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.4:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c003d20 con 0x7f3c8c003af0
2014-11-15 14:09:10.498568 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:13.498688 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:13.498726 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:13.498730 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:13.498745 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c003af0 -- 0x7f3c8c003880
2014-11-15 14:09:13.498905 7f3c95ae7700 10 monclient(hunting): picked mon.a con 0x7f3c8c004000 addr 10.110.13.2:6789/0
2014-11-15 14:09:13.498936 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.a at 10.110.13.2:6789/0
2014-11-15 14:09:13.498943 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c004230 con 0x7f3c8c004000
2014-11-15 14:09:13.499087 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:13.499215 7f3c94ae5700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.2:6789/0 pipe(0x7f3c8c003080 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c004000).fault
2014-11-15 14:09:16.499236 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:16.499264 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:16.499266 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:16.499277 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c004000 -- 0x7f3c8c003080
2014-11-15 14:09:16.499340 7f3c95ae7700 10 monclient(hunting): picked mon.c con 0x7f3c8c0049f0 addr 10.110.13.4:6789/0
2014-11-15 14:09:16.499359 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.c at 10.110.13.4:6789/0
2014-11-15 14:09:16.499365 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.4:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c004c20 con 0x7f3c8c0049f0
2014-11-15 14:09:16.499449 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:19.499552 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:19.499578 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:19.499580 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:19.499590 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c0049f0 -- 0x7f3c8c004780
2014-11-15 14:09:19.499680 7f3c95ae7700 10 monclient(hunting): picked mon.a con 0x7f3c8c004fb0 addr 10.110.13.2:6789/0
2014-11-15 14:09:19.499698 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.a at 10.110.13.2:6789/0
2014-11-15 14:09:19.499701 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c0051e0 con 0x7f3c8c004fb0
2014-11-15 14:09:19.499747 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:19.500068 7f3c9c164700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.2:6789/0 pipe(0x7f3c8c003080 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c004fb0).fault
2014-11-15 14:09:22.499853 7f3c95ae7700 10 monclient(hunting): tick
2014-11-15 14:09:22.499890 7f3c95ae7700 1 monclient(hunting): continuing hunt
2014-11-15 14:09:22.499893 7f3c95ae7700 10 monclient(hunting): _reopen_session rank 1 name
2014-11-15 14:09:22.499907 7f3c95ae7700 1 -
10.110.13.3:0/1029895 mark_down 0x7f3c8c004fb0 -- 0x7f3c8c003080
2014-11-15 14:09:22.499992 7f3c95ae7700 10 monclient(hunting): picked mon.b con 0x7f3c8c005880 addr 10.110.13.3:6789/0
2014-11-15 14:09:22.500014 7f3c95ae7700 10 monclient(hunting): _send_mon_message to mon.b at 10.110.13.3:6789/0
2014-11-15 14:09:22.500017 7f3c95ae7700 1 -- 10.110.13.3:0/1029895 --> 10.110.13.3:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f3c8c005af0 con 0x7f3c8c005880
2014-11-15 14:09:22.500028 7f3c95ae7700 10 monclient(hunting): renew_subs
2014-11-15 14:09:22.500110 7f3c94ae5700 0 -- 10.110.13.3:0/1029895 >> 10.110.13.3:6789/0 pipe(0x7f3c8c005610 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3c8c005880).fault
failed: 'timeout 30 /usr/bin/ceph c /etc/ceph/ceph.conf --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move - 1 9.09 host=compute1 root=default'


Files

mds.b.log (8.18 KB) mds.b.log ? ??, 11/14/2014 10:22 PM
mon.b.log (158 KB) mon.b.log ? ??, 11/14/2014 10:22 PM
osd.1.log (9.26 KB) osd.1.log ? ??, 11/14/2014 10:22 PM
Actions

Also available in: Atom PDF