Support #36351
openmon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
0%
Description
I have a CEPH cluster which contains 3 mons, due to abnormal power failure, one mon service starts abnormally. The exception information is as follows:
2018-10-09 18:01:23.124782 7fc1ab4cfe40 1 mon.Ceph03@-1(probing) e1 preinit fsid ad403a3c-78e3-11e8-982b-52540056dc48 2018-10-09 18:01:23.124985 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 check_fsid cluster_uuid contains 'ad403a3c-78e3-11e8-982b-52540056dc48' 2018-10-09 18:01:23.125013 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18),3=single paxos with k/v store (v0.?),4=support erasure code pools,5=new-style osdmap encoding,6=support isa/lrc erasure code,7=support shec erasure code,8=support monmap features,9=luminous ondisk layout} 2018-10-09 18:01:23.125029 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 calc_quorum_requirements required_features 153140804152475648 2018-10-09 18:01:23.125035 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 required_features 153140804152475648 2018-10-09 18:01:23.125049 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 has_ever_joined = 1 2018-10-09 18:01:23.125090 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 sync_last_committed_floor 0 2018-10-09 18:01:23.125096 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 init_paxos 2018-10-09 18:01:23.125152 7fc1ab4cfe40 1 mon.Ceph03@-1(probing).mds e0 Unable to load 'last_metadata' 2018-10-09 18:01:23.125210 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).health init 2018-10-09 18:01:23.125218 7fc1ab4cfe40 10 mon.Ceph03@-1(probing) e1 refresh_from_paxos 2018-10-09 18:01:23.125252 7fc1ab4cfe40 1 mon.Ceph03@-1(probing).pg v0 on_upgrade discarding in-core PGMap 2018-10-09 18:01:23.125289 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).pg v0 update_from_paxos deleted, clearing in-memory PGMap 2018-10-09 18:01:23.125336 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).mds e0 update_from_paxos version 1, my e 0 2018-10-09 18:01:23.125379 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).mds e0 update_from_paxos got 1 2018-10-09 18:01:23.125405 7fc1ab4cfe40 4 mon.Ceph03@-1(probing).mds e1 new map 2018-10-09 18:01:23.125412 7fc1ab4cfe40 0 mon.Ceph03@-1(probing).mds e1 print_map e1 enable_multiple, ever_enabled_multiple: 0,0 compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2} legacy client fscid: -1 No filesystems configured 2018-10-09 18:01:23.125438 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).mds e1 update_logger 2018-10-09 18:01:23.125565 7fc1ab4cfe40 15 mon.Ceph03@-1(probing).osd e0 update_from_paxos paxos e 1794, my e 0 2018-10-09 18:01:23.125792 7fc1ab4cfe40 7 mon.Ceph03@-1(probing).osd e0 update_from_paxos loading latest full map e1767 2018-10-09 18:01:23.126303 7fc1ab4cfe40 7 mon.Ceph03@-1(probing).osd e1767 update_from_paxos loading creating_pgs last_scan_epoch 1793 with 0 pgs 2018-10-09 18:01:23.126312 7fc1ab4cfe40 10 mon.Ceph03@-1(probing).osd e1767 update_from_paxos pgservice is mgrstat 2018-10-09 18:01:23.129140 7fc1ab4cfe40 -1 /clove/vm/zstor/ceph/rpmbuild/BUILD/ceph-12.2.2/src/mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7fc1ab4cfe40 time 2018-10-09 18:01:23.126328 /clove/vm/zstor/ceph/rpmbuild/BUILD/ceph-12.2.2/src/mon/OSDMonitor.cc: 380: FAILED assert(err == 0) ceph version 12.2.2-16-6-g9e6bce0 (9e6bce0774b1d5d61c9327cc7c032b9cfea145bc) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55aa5ecc2690] 2: (OSDMonitor::update_from_paxos(bool*)+0x2e5f) [0x55aa5ebdf05f] 3: (PaxosService::refresh(bool*)+0x1ae) [0x55aa5ebacc9e] 4: (Monitor::refresh_from_paxos(bool*)+0x193) [0x55aa5ea7e263] 5: (Monitor::init_paxos()+0x115) [0x55aa5ea7e695] 6: (Monitor::preinit()+0x9c6) [0x55aa5ea7f0b6] 7: (main()+0x4738) [0x55aa5e9ae668] 8: (__libc_start_main()+0xf5) [0x7fc1a81a4c05] 9: (()+0x37292e) [0x55aa5ea5392e] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Updated by huanwen ren over 5 years ago
*By using the tool (ceph-monstore-tool) to start the abnormal mon directory, I can get the osdmap and monmap information correctly.
Operation is as follows:*
*ceph-monstore-tool ./ceph-Ceph03/ get monmap > abc
ceph-monstore-tool ./ceph-Ceph03/ get osdmap > a2
monmaptool --print abc*
monmaptool: monmap file abc epoch 1 fsid ad403a3c-78e3-11e8-982b-52540056dc48 last_changed 2018-06-26 10:05:02.096907 created 2018-06-26 10:05:02.096907 0: [fd00:0:5:99::100]:6789/0 mon.Ceph01 1: [fd00:0:5:99::101]:6789/0 mon.Ceph02 2: [fd00:0:5:99::102]:6789/0 mon.Ceph03
osdmaptool --print a2 > aa
Updated by huanwen ren over 5 years ago
Maybe the same as this issues: http://tracker.ceph.com/issues/12941
Updated by John Spray over 5 years ago
- Project changed from mgr to RADOS
- Category set to Correctness/Safety
- Component(RADOS) Monitor added
Updated by Greg Farnum over 5 years ago
- Tracker changed from Bug to Support
12.2.2 is pretty out-of-date for Luminous and you appear to be running a custom build, so I'm not sure my line numbers are right. But the two asserts which match that are
1) that the monitor successfully reads the new osdmap incremental version off disk,
2) that the incremental applies correctly.
This is likely to be some kind of disk issue or the result of a bad patch.