Project

General

Profile

Actions

Bug #1613

closed

mon crash

Added by Hong Cho over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

I am currently running ceph 0.36 w/ kernel 3.0.0 on two machines each with 4 2tb HDDs. All of those HDDs are being used as OSDs and both machines are running MON and MDS. This evening one machine showed OOPS and died. After reboot I am no longer able to mount the file system. It appears that monitor has been crashed. So I am attaching log from the monitor.

Is there some thing I can try to get the filesystem back? Or does this indicate that I will need to start a new one?

2011-10-11 20:26:00.445263 7f37928a2720 ceph version 0.36 (877cacbbdbb78d120b431cb762f57332d62ef0a6), process ceph-mon, pid 4020
2011-10-11 20:26:00.445462 7f37928a2720 store(/mnt/mon0) mount
2011-10-11 20:26:00.446112 7f37928a2720 -- 192.168.1.10:6789/0 accepter.bind ms_addr is 192.168.1.10:6789/0 need_addr=0
2011-10-11 20:26:00.448053 7f37928a2720 -- 192.168.1.10:6789/0 messenger.start
2011-10-11 20:26:00.448118 7f37928a2720 -- 192.168.1.10:6789/0 accepter.start
2011-10-11 20:26:00.448336 7f37928a2720 mon.0@0(starting) e1 init fsid 509a0e09-3cbf-33f3-fb42-7bafa0bd4401
2011-10-11 20:26:00.449375 7f37928a2720 log [INF] : mon.0 calling new monitor election
2011-10-11 20:26:00.449487 7f37928a2720 -- 192.168.1.10:6789/0 --> mon0 192.168.1.10:6789/0 -- log(1 entries) v1 -- ?+0 0x1b0d000
2011-10-11 20:26:00.449642 7f378ebc7700 -- 192.168.1.10:6789/0 <== mon0 192.168.1.10:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x1b0d000 con 0x1b17000
2011-10-11 20:26:00.501154 7f37928a2720 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- election(propose 1) v1 -- ?+0 0x1b0d400
2011-10-11 20:26:00.501494 7f378ebc7700 -- 192.168.1.10:6789/0 <== osd4 192.168.1.11:6801/3535 1 ==== auth(proto 0 26 bytes) v1 ==== 52+0+0 (1205707267 0 0) 0x1b0d600 con 0x1b17c80
2011-10-11 20:26:00.502604 7f378ebc7700 -- 192.168.1.10:6789/0 <== mds? 192.168.1.11:6800/3495 1 ==== auth(proto 0 29 bytes) v1 ==== 55+0+0 (1850031845 0 0) 0x1b0de00 con 0x1b253c0

2011-10-11 20:26:00.609344 7f378ebc7700 -- 192.168.1.10:6789/0 <== osd1 192.168.1.10:6804/1748 1 ==== auth(proto 0 26 bytes) v1 ==== 52+0+0 (1831646840 0 0) 0x1b27000 con 0x1b25140
2011-10-11 20:26:00.783236 7f378ebc7700 -- 192.168.1.10:6789/0 <== osd6 192.168.1.11:6807/3638 1 ==== auth(proto 0 26 bytes) v1 ==== 52+0+0 (2857138274 0 0) 0x1b0d400 con 0x1b25a00
2011-10-11 20:26:00.837565 7f378ebc7700 -- 192.168.1.10:6789/0 <== osd0 192.168.1.10:6801/1654 1 ==== auth(proto 0 26 bytes) v1 ==== 52+0+0 (2568118320 0 0) 0x1b27800 con 0x1b25780
2011-10-11 20:26:01.229308 7f378ebc7700 -- 192.168.1.10:6789/0 <== osd5 192.168.1.11:6804/3587 1 ==== auth(proto 0 26 bytes) v1 ==== 52+0+0 (3018028107 0 0) 0x1b27a00 con 0x1b243c0
2011-10-11 20:26:01.334853 7f378ebc7700 -- 192.168.1.10:6789/0 <== osd3 192.168.1.10:6810/1991 1 ==== auth(proto 0 26 bytes) v1 ==== 52+0+0 (2159897625 0 0) 0x1b2b000 con 0x1b24140
2011-10-11 20:26:01.356515 7f378ebc7700 -- 192.168.1.10:6789/0 <== mon1 192.168.1.11:6789/0 1 ==== election(propose 23) v1 ==== 340+0+0 (3872700071 0 0) 0x1b2b200 con 0x1b17a00
2011-10-11 20:26:01.840179 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- election(propose 23) v1 -- ?+0 0x1b2ba00
2011-10-11 20:26:01.840293 7f378ebc7700 -- 192.168.1.10:6789/0 <== mds? 192.168.1.10:6800/1544 1 ==== auth(proto 0 29 bytes) v1 ==== 55+0+0 (2011837132 0 0) 0x1b2be00 con 0x1b24dc0
2011-10-11 20:26:01.840357 7f378ebc7700 -- 192.168.1.10:6789/0 <== osd2 192.168.1.10:6807/1890 1 ==== auth(proto 0 26 bytes) v1 ==== 52+0+0 (1954789969 0 0) 0x1b2d000 con 0x1b24b40
2011-10-11 20:26:01.841289 7f378ebc7700 -- 192.168.1.10:6789/0 <== mon1 192.168.1.11:6789/0 2 ==== election(ack 23) v1 ==== 340+0+0 (176343387 0 0) 0x1b2ba00 con 0x1b17a00
2011-10-11 20:26:01.857175 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- election(victory 24) v1 -- ?+0 0x1b2b600
2011-10-11 20:26:01.857260 7f378ebc7700 log [INF] : mon.0@0 won leader election with quorum 0,1
2011-10-11 20:26:01.857326 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon0 192.168.1.10:6789/0 -- log(1 entries) v1 -- ?+0 0x1b2b400
2011-10-11 20:26:01.870530 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(pgmap collect lc 14655 fc 14655 pn 23900 opn 0) v1 -- ?+0 0x1b28240
2011-10-11 20:26:01.885007 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(mdsmap collect lc 7082 fc 0 pn 23900 opn 0) v1 -- ?+0 0x1b286c0
2011-10-11 20:26:01.906375 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(osdmap collect lc 0 fc 0 pn 23900 opn 0) v1 -- ?+0 0x1b28480
2011-10-11 20:26:01.918880 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(logm collect lc 0 fc 14655 pn 23900 opn 0) v1 -- ?+0 0x1b28d80
2011-10-11 20:26:01.930989 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(monmap collect lc 1 fc 0 pn 23900 opn 0) v1 -- ?+0 0x1b28b40
2011-10-11 20:26:01.941242 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(auth collect lc 228 fc 0 pn 1600 opn 0) v1 -- ?+0 0x1b28900
2011-10-11 20:26:01.941382 7f378ebc7700 -- 192.168.1.10:6789/0 <== mon0 192.168.1.10:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x1b2b400 con 0x1b17000
2011-10-11 20:26:02.075082 7f378ebc7700 -- 192.168.1.10:6789/0 <== mon1 192.168.1.11:6789/0 3 ==== paxos(pgmap last lc 14655 fc 14655 pn 23900 opn 0) v1 ==== 84+0+0 (1586122263 0 0) 0x1b28900 con 0x1b17a00
2011-10-11 20:26:02.075145 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(pgmap lease lc 14655 fc 14655 pn 0 opn 0) v1 -- ?+0 0x1b28240
2011-10-11 20:26:02.103460 7f378ebc7700 -- 192.168.1.10:6789/0 <== mon1 192.168.1.11:6789/0 4 ==== paxos(mdsmap last lc 232 fc 0 pn 23900 opn 0) v1 ==== 84+0+0 (2474334558 0 0) 0x1b28240 con 0x1b17a002011-10-11 20:26:02.131162 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(mdsmap commit lc 7082 fc 0 pn 0 opn 0) v1 -- ?+0 0x1b28900
2011-10-11 20:26:02.131261 7f378ebc7700 -- 192.168.1.10:6789/0 --> mon1 192.168.1.11:6789/0 -- paxos(mdsmap lease lc 7082 fc 0 pn 0 opn 0) v1 -- ?+0 0x1b286c0
  • Caught signal (Aborted) *
    in thread 0x7f378ebc7700
    ceph version 0.36 (877cacbbdbb78d120b431cb762f57332d62ef0a6)
    1: /usr/bin/ceph-mon() [0x5b0ca4]
    2: (()+0xf020) [0x7f3792484020]
    3: (gsignal()+0x35) [0x7f3790e163d5]
    4: (abort()+0x180) [0x7f3790e19650]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f37916bde8d]
    6: (()+0xbd036) [0x7f37916bc036]
    7: (()+0xbd063) [0x7f37916bc063]
    8: (()+0xbd15e) [0x7f37916bc15e]
    9: (ceph::buffer::list::iterator::copy(unsigned int, char
    )+0x127) [0x585327]
    10: (MDSMap::decode(ceph::buffer::list::iterator&)+0x2c) [0x4c624c]
    11: (MDSMonitor::update_from_paxos()+0x230) [0x4bd430]
    12: (PaxosService::_active()+0x39) [0x48df09]
    13: (Context::complete(int)+0xa) [0x479a0a]
    14: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xca) [0x48a86a]
    15: (Paxos::handle_last(MMonPaxos*)+0x59b) [0x4891fb]
    16: (Paxos::dispatch(PaxosServiceMessage*)+0x24b) [0x4897fb]
    17: (Monitor::_ms_dispatch(Message*)+0x82e) [0x4765ae]
    18: (Monitor::ms_dispatch(Message*)+0x69) [0x480839]
    19: (SimpleMessenger::dispatch_entry()+0x7bb) [0x57359b]
    20: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x46700c]
    21: (()+0x6b40) [0x7f379247bb40]
    22: (clone()+0x6d) [0x7f3790eb928d]

Regards,
Hong

Actions

Also available in: Atom PDF