Project

General

Profile

Actions

Support #36351

open

mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2

Added by huanwen ren over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(RADOS):
Monitor
Pull request ID:

Description

I have a CEPH cluster which contains 3 mons, due to abnormal power failure, one mon service starts abnormally. The exception information is as follows:

2018-10-09 18:01:23.124782 7fc1ab4cfe40  1 mon.Ceph03@-1(probing) e1 preinit fsid ad403a3c-78e3-11e8-982b-52540056dc48
2018-10-09 18:01:23.124985 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 check_fsid cluster_uuid contains 'ad403a3c-78e3-11e8-982b-52540056dc48'
2018-10-09 18:01:23.125013 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18),3=single paxos with k/v store (v0.?),4=support erasure code pools,5=new-style osdmap encoding,6=support isa/lrc erasure code,7=support shec erasure code,8=support monmap features,9=luminous ondisk layout}
2018-10-09 18:01:23.125029 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 calc_quorum_requirements required_features 153140804152475648
2018-10-09 18:01:23.125035 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 required_features 153140804152475648
2018-10-09 18:01:23.125049 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 has_ever_joined = 1
2018-10-09 18:01:23.125090 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 sync_last_committed_floor 0
2018-10-09 18:01:23.125096 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 init_paxos
2018-10-09 18:01:23.125152 7fc1ab4cfe40  1 mon.Ceph03@-1(probing).mds e0 Unable to load 'last_metadata'
2018-10-09 18:01:23.125210 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).health init
2018-10-09 18:01:23.125218 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 refresh_from_paxos
2018-10-09 18:01:23.125252 7fc1ab4cfe40  1 mon.Ceph03@-1(probing).pg v0 on_upgrade discarding in-core PGMap
2018-10-09 18:01:23.125289 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).pg v0 update_from_paxos deleted, clearing in-memory PGMap
2018-10-09 18:01:23.125336 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).mds e0 update_from_paxos version 1, my e 0
2018-10-09 18:01:23.125379 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).mds e0 update_from_paxos got 1
2018-10-09 18:01:23.125405 7fc1ab4cfe40  4 mon.Ceph03@-1(probing).mds e1 new map
2018-10-09 18:01:23.125412 7fc1ab4cfe40  0 mon.Ceph03@-1(probing).mds e1 print_map
e1
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
legacy client fscid: -1

No filesystems configured

2018-10-09 18:01:23.125438 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).mds e1 update_logger
2018-10-09 18:01:23.125565 7fc1ab4cfe40 15 mon.Ceph03@-1(probing).osd e0 update_from_paxos paxos e 1794, my e 0
2018-10-09 18:01:23.125792 7fc1ab4cfe40  7 mon.Ceph03@-1(probing).osd e0 update_from_paxos loading latest full map e1767
2018-10-09 18:01:23.126303 7fc1ab4cfe40  7 mon.Ceph03@-1(probing).osd e1767 update_from_paxos loading creating_pgs last_scan_epoch 1793 with 0 pgs
2018-10-09 18:01:23.126312 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).osd e1767 update_from_paxos pgservice is mgrstat
2018-10-09 18:01:23.129140 7fc1ab4cfe40 -1 /clove/vm/zstor/ceph/rpmbuild/BUILD/ceph-12.2.2/src/mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7fc1ab4cfe40 time 2018-10-09 18:01:23.126328
/clove/vm/zstor/ceph/rpmbuild/BUILD/ceph-12.2.2/src/mon/OSDMonitor.cc: 380: FAILED assert(err == 0)

 ceph version 12.2.2-16-6-g9e6bce0 (9e6bce0774b1d5d61c9327cc7c032b9cfea145bc) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55aa5ecc2690]
 2: (OSDMonitor::update_from_paxos(bool*)+0x2e5f) [0x55aa5ebdf05f]
 3: (PaxosService::refresh(bool*)+0x1ae) [0x55aa5ebacc9e]
 4: (Monitor::refresh_from_paxos(bool*)+0x193) [0x55aa5ea7e263]
 5: (Monitor::init_paxos()+0x115) [0x55aa5ea7e695]
 6: (Monitor::preinit()+0x9c6) [0x55aa5ea7f0b6]
 7: (main()+0x4738) [0x55aa5e9ae668]
 8: (__libc_start_main()+0xf5) [0x7fc1a81a4c05]
 9: (()+0x37292e) [0x55aa5ea5392e]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Files

aa (9.63 KB) aa huanwen ren, 10/10/2018 01:02 AM
Actions #1

Updated by huanwen ren over 5 years ago

*By using the tool (ceph-monstore-tool) to start the abnormal mon directory, I can get the osdmap and monmap information correctly.

Operation is as follows:*

*ceph-monstore-tool ./ceph-Ceph03/ get monmap > abc
ceph-monstore-tool ./ceph-Ceph03/ get osdmap > a2

monmaptool --print abc*

monmaptool: monmap file abc
epoch 1
fsid ad403a3c-78e3-11e8-982b-52540056dc48
last_changed 2018-06-26 10:05:02.096907
created 2018-06-26 10:05:02.096907
0: [fd00:0:5:99::100]:6789/0 mon.Ceph01
1: [fd00:0:5:99::101]:6789/0 mon.Ceph02
2: [fd00:0:5:99::102]:6789/0 mon.Ceph03

osdmaptool --print a2 > aa

Actions #2

Updated by huanwen ren over 5 years ago

Maybe the same as this issues: http://tracker.ceph.com/issues/12941

Actions #3

Updated by John Spray over 5 years ago

  • Project changed from mgr to RADOS
  • Category set to Correctness/Safety
  • Component(RADOS) Monitor added
Actions #4

Updated by Greg Farnum over 5 years ago

  • Tracker changed from Bug to Support

12.2.2 is pretty out-of-date for Luminous and you appear to be running a custom build, so I'm not sure my line numbers are right. But the two asserts which match that are
1) that the monitor successfully reads the new osdmap incremental version off disk,
2) that the incremental applies correctly.

This is likely to be some kind of disk issue or the result of a bad patch.

Actions

Also available in: Atom PDF