Project

General

Profile

Actions

Bug #3938

closed

ceph-mon crashed on mixed bobtail-argonaut cluster (2 argonaut mons, 1 bobtail)

Added by Samuel Just about 11 years ago. Updated about 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Joao Eduardo Luis
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

7:09:03.310220 7f652087e700 1 mon.a@1(peon).osd e72 e72: 20 osds: 20 up, 20 in ?·······················································································································
2013-01-25 17:09:04.312351 7f652087e700 1 mon.a@1(peon).osd e73 e73: 20 osds: 20 up, 20 in ?·······················································································································
2013-01-25 17:09:32.397604 7f652087e700 1 mon.a@1(peon).osd e74 e74: 20 osds: 20 up, 20 in ?·······················································································································
2013-01-25 17:13:03.416677 7f652087e700 1 mon.a@1(peon).osd e75 e75: 20 osds: 19 up, 20 in ?·······················································································································
2013-01-25 17:13:04.419033 7f652087e700 1 mon.a@1(peon).osd e76 e76: 20 osds: 20 up, 20 in ?·······················································································································
2013-01-25 17:13:05.421204 7f652087e700 1 mon.a@1(peon).osd e77 e77: 20 osds: 20 up, 20 in ?·······················································································································
2013-01-25 17:13:10.548002 7f651ed71700 0 -- 10.214.134.6:6789/0 >> 10.214.134.4:6789/0 pipe(0x1a38200 sd=24 pgs=1 cs=1 l=0).fault with nothing to send, going to standby ?·······················································································································
2013-01-25 17:13:10.550950 7f652087e700 0 mon.a@1(peon) e1 handle_command on fsid 243c7d16-7b97-4548-b2b0-bbc640e1a806 != 49c8ae28-dc91-4a2e-a82b-ad270d645bf1 ?·······················································································································
2013-01-25 17:13:16.208287 7f651c94d700 0 will not decode message of type 72 version 3 because compat_version 3 > supported version 2 ?·······················································································································
2013-01-25 17:13:17.771701 7f652087e700 0 log [INF] : mon.a calling new monitor election ?·······················································································································
2013-01-25 17:13:22.787899 7f652007d700 0 log [INF] : mon.a@1 won leader election with quorum 1,2 ?·······················································································································
2013-01-25 17:13:22.798393 7f651bf43700 -1 * Caught signal (Segmentation fault) * ?·······················································································································
in thread 7f651bf43700 ?·······················································································································
?·······················································································································
ceph version 0.48.2argonaut-16-g24fc459 (commit:24fc4599a7d1d3d49ce4623723723bd31f701cca) ?·······················································································································
1: /usr/bin/ceph-mon() [0x5309fa] ?·······················································································································
2: (()+0xfcb0) [0x7f65253e3cb0] ?·······················································································································
3: (decode_message(CephContext
, ceph_msg_header&, ceph_msg_footer&, ceph::buffer::list&, ceph::buffer::list&, ceph::buffer::list&)+0x6af) [0x56a70f] ?·······················································································································
4: (decode_message(CephContext*, ceph::buffer::list::iterator&)+0xa8) [0x56f678] ?·······················································································································
5: (MForward::decode_payload()+0xff) [0x489b9f] ?·······················································································································
6: (decode_message(CephContext*, ceph_msg_header&, ceph_msg_footer&, ceph::buffer::list&, ceph::buffer::list&, ceph::buffer::list&)+0xce6) [0x56ad46] ?·······················································································································
7: (SimpleMessenger::Pipe::read_message(Message
)+0x1385) [0x5c39f5] ?·······················································································································
8: (SimpleMessenger::Pipe::reader()+0xac0) [0x5d89f0] ?·······················································································································
9: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x59bf0d] ?·······················································································································
10: (()+0x7e9a) [0x7f65253dbe9a] ?·······················································································································
11: (clone()+0x6d) [0x7f6523b91cbd] ?·······················································································································
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #1

Updated by Samuel Just about 11 years ago

  • Assignee set to Joao Eduardo Luis
  • Priority changed from Normal to High
Actions #2

Updated by Sage Weil about 11 years ago

is there a core for this?

Actions #3

Updated by Samuel Just about 11 years ago

No, didn't have it set up. I could probably reproduce if necessary.

Actions #4

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from New to In Progress

Have a cluster set-up and ready to start trying to reproduce this in the morning.

Actions #5

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from In Progress to Can't reproduce

After a couple of days trying to reproduce this issue (and massively failing at it), and given the lack of debug info on the log, we will have to wait until someone else in unfortunate enough to trigger this either with higher debug levels or consistently enough that allows us to further track down the cause.

Actions

Also available in: Atom PDF