Project

General

Profile

Bug #12941

mon/OSDMonitor.cc: 204: FAILED assert(err == 0) 0.94

Added by bo cai about 4 years ago. Updated over 2 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
Start date:
09/04/2015
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Yesterday I found my cluster is broken, later found to be two monitor is broken(A total of three), I want to repair it, so I use the command:
ceph-mon -i vm13__

But the following error?

mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f7fa64248c0 time 2015-09-04 13:31:48.448126
mon/OSDMonitor.cc: 204: FAILED assert(err == 0)
 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7e708b]
 2: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
 3: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
 4: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
 5: (Monitor::init_paxos()+0x85) [0x5ba7e5]
 6: (Monitor::preinit()+0x7d7) [0x5bf447]
 7: (main()+0x22dd) [0x5819ed]
 8: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
 9: ceph-mon() [0x5a3607]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-09-04 13:31:48.449369 7f7fa64248c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f7fa64248c0 time 2015-09-04 13:31:48.448126
mon/OSDMonitor.cc: 204: FAILED assert(err == 0)

 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7e708b]
 2: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
 3: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
 4: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
 5: (Monitor::init_paxos()+0x85) [0x5ba7e5]
 6: (Monitor::preinit()+0x7d7) [0x5bf447]
 7: (main()+0x22dd) [0x5819ed]
 8: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
 9: ceph-mon() [0x5a3607]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2015-09-04 13:31:48.449369 7f7fa64248c0 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f7fa64248c0 time 2015-09-04 13:31:48.448126
mon/OSDMonitor.cc: 204: FAILED assert(err == 0)

 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7e708b]
 2: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
 3: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
 4: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
 5: (Monitor::init_paxos()+0x85) [0x5ba7e5]
 6: (Monitor::preinit()+0x7d7) [0x5bf447]
 7: (main()+0x22dd) [0x5819ed]
 8: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
 9: ceph-mon() [0x5a3607]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
 in thread 7f7fa64248c0
 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: ceph-mon() [0x9b050a]
 2: (()+0x10340) [0x7f7fa5526340]
 3: (gsignal()+0x39) [0x7f7fa3871cc9]
 4: (abort()+0x148) [0x7f7fa38750d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f7fa417c535]
 6: (()+0x5e6d6) [0x7f7fa417a6d6]
 7: (()+0x5e703) [0x7f7fa417a703]
 8: (()+0x5e922) [0x7f7fa417a922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0x7e7278]
 10: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
 11: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
 12: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
 13: (Monitor::init_paxos()+0x85) [0x5ba7e5]
 14: (Monitor::preinit()+0x7d7) [0x5bf447]
 15: (main()+0x22dd) [0x5819ed]
 16: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
 17: ceph-mon() [0x5a3607]
2015-09-04 13:31:48.451603 7f7fa64248c0 -1 *** Caught signal (Aborted) **
 in thread 7f7fa64248c0

 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: ceph-mon() [0x9b050a]
 2: (()+0x10340) [0x7f7fa5526340]
 3: (gsignal()+0x39) [0x7f7fa3871cc9]
 4: (abort()+0x148) [0x7f7fa38750d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f7fa417c535]
 6: (()+0x5e6d6) [0x7f7fa417a6d6]
 7: (()+0x5e703) [0x7f7fa417a703]
 8: (()+0x5e922) [0x7f7fa417a922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0x7e7278]
 10: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
 11: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
 12: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
 13: (Monitor::init_paxos()+0x85) [0x5ba7e5]
 14: (Monitor::preinit()+0x7d7) [0x5bf447]
 15: (main()+0x22dd) [0x5819ed]
 16: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
 17: ceph-mon() [0x5a3607]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2015-09-04 13:31:48.451603 7f7fa64248c0 -1 *** Caught signal (Aborted) **
 in thread 7f7fa64248c0

 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: ceph-mon() [0x9b050a]
 2: (()+0x10340) [0x7f7fa5526340]
 3: (gsignal()+0x39) [0x7f7fa3871cc9]
 4: (abort()+0x148) [0x7f7fa38750d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f7fa417c535]
 6: (()+0x5e6d6) [0x7f7fa417a6d6]
 7: (()+0x5e703) [0x7f7fa417a703]
 8: (()+0x5e922) [0x7f7fa417a922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0x7e7278]
 10: (OSDMonitor::update_from_paxos(bool*)+0x21eb) [0x62d04b]
 11: (PaxosService::refresh(bool*)+0x19a) [0x60d64a]
 12: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5ba4a3]
 13: (Monitor::init_paxos()+0x85) [0x5ba7e5]
 14: (Monitor::preinit()+0x7d7) [0x5bf447]
 15: (main()+0x22dd) [0x5819ed]
 16: (__libc_start_main()+0xf5) [0x7f7fa385cec5]
 17: ceph-mon() [0x5a3607]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

[27101]: (33) Numerical argument out of domain

Is this a monitor related file corruption yet?

ceph-mon.vm14.log View - 1/3 log file (282 KB) bo cai, 09/09/2015 07:45 AM

syslog - system log (164 KB) bo cai, 09/09/2015 07:45 AM

History

#1 Updated by bo cai about 4 years ago

Looks like the redmine format has a little problem

#2 Updated by Loic Dachary about 4 years ago

  • Description updated (diff)
  • Target version deleted (v0.94.4)

#3 Updated by Loic Dachary about 4 years ago

mon/OSDMonitor.cc: 204: FAILED assert(err == 0) is at https://github.com/ceph/ceph/blob/v0.94.2/src/mon/OSDMonitor.cc#L204

#4 Updated by Loic Dachary about 4 years ago

  • Status changed from New to Need More Info

It would help a lot if you could include more information from the log files. Is the file system on which the Mon reside ok ? Or did it suffer a failure / repair of some sort ?

#5 Updated by bo cai about 4 years ago

I have a total of three monitors , one of which can not be started(vm14) , it is no problem with file system.
I even can get the monmap info by use command: ceph-mon -i vm14 --extract-monmap myfile && monmaptool --print myfile.
then print info:
epoch 1
fsid dc4f5635-5454-4d2c-94b7-ec05254596fe
last_changed 0.000000
created 0.000000
0: 172.16.31.212:6789/0 mon.vm13
1: 172.16.31.213:6789/0 mon.vm14
2: 172.16.31.214:6789/0 mon.vm15

so I think the file system is ok.

And I uploading the log of three host, hope that helps.
If you need more information please tell me.

#6 Updated by Loic Dachary about 4 years ago

  • Status changed from Need More Info to New

#7 Updated by Loic Dachary about 4 years ago

  • Subject changed from can not start monitor to mon/OSDMonitor.cc: 204: FAILED assert(err == 0) 0.94

#8 Updated by Sage Weil over 2 years ago

  • Status changed from New to Can't reproduce

please reopen if this is still a problem

Also available in: Atom PDF