Bug #23892
luminous->mimic: mon segv in ~MonOpRequest from OpHistoryServiceThread
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
0> 2018-04-26 19:35:25.704 7faeeca14700 -1 *** Caught signal (Segmentation fault) ** in thread 7faeeca14700 thread_name:OpHistorySvc ceph version 13.0.2-1854-g97635aa (97635aacc8013122f577e745aec3e30962e19b68) mimic (dev) 1: (()+0x4b0cc0) [0x55f77ee6bcc0] 2: (()+0x11390) [0x7faef4035390] 3: (RefCountedObject::put() const+0xb8) [0x55f77ec65308] 4: (MonOpRequest::~MonOpRequest()+0x32) [0x55f77ec65602] 5: (std::_Rb_tree<std::pair<double, boost::intrusive_ptr<TrackedOp> >, std::pair<double, boost::intrusive_ptr<TrackedOp> >, std::_Identity<std::pair<double, boost::intrusive_ptr<TrackedOp> > >, std::less<std::pair<double, boost::intrusive_ptr<TrackedOp> > >, std::allocator<std::pair<double, boost::intrusive_ptr<TrackedOp> > > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<double, boost::intrusive_ptr<TrackedOp> > >)+0x83) [0x7faef474a083] 6: (OpHistory::cleanup(utime_t)+0x2b4) [0x7faef4746d04] 7: (OpHistory::_insert_delayed(utime_t const&, boost::intrusive_ptr<TrackedOp>)+0x20c) [0x7faef474768c] 8: (OpHistoryServiceThread::entry()+0xe9) [0x7faef4747a69] 9: (()+0x76ba) [0x7faef402b6ba] 10: (clone()+0x6d) [0x7faef338941d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels ---
gdb shows (MonOpRequest)0x55f78083ba40 and request (Message)0x55f780ba1600
which is
-2359> 2018-04-26 19:35:20.148 7faee920d700 1 -- 172.21.15.110:6789/0 <== mon.2 172.21.15.94:6790/0 1947548406 ==== mon_probe(reply 3d39e9e8-3d17-457e-b675-3c058fc49b2f name c quorum 0,1,2 paxos( fc 1 lc 109 )) v6 ==== 427+0+0 (2026604521 0 0) 0x55f780ba1600 con 0x55f780caa300 -2358> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 _ms_dispatch new session 0x55f780835180 MonSession(mon.2 172.21.15.94:6790/0 is open , features 0x1ffddff8eea4fffb (luminous)) features 0x1ffddff8eea4fffb -2357> 2018-04-26 19:35:20.148 7faee920d700 5 mon.a@1(probing) e1 _ms_dispatch setting monitor caps on this connection -2356> 2018-04-26 19:35:20.148 7faee920d700 20 mon.a@1(probing) e1 caps allow * -2355> 2018-04-26 19:35:20.148 7faee920d700 20 is_capable service=mon command= read on cap allow * -2354> 2018-04-26 19:35:20.148 7faee920d700 20 allow so far , doing grant allow * -2353> 2018-04-26 19:35:20.148 7faee920d700 20 allow all -2352> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 handle_probe mon_probe(reply 3d39e9e8-3d17-457e-b675-3c058fc49b2f name c quorum 0,1,2 paxos( fc 1 lc 109 )) v6 -2351> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 handle_probe_reply mon.2 172.21.15.94:6790/0mon_probe(reply 3d39e9e8-3d17-457e-b675-3c058fc49b2f name c quorum 0,1,2 paxos( fc 1 lc 109 )) v6 -2350> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 monmap is e1: 3 mons at {a=172.21.15.110:6789/0,b=172.21.15.94:6789/0,c=172.21.15.94:6790/0} -2349> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 peer name is c -2348> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 existing quorum 0,1,2 -2347> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 peer paxos version 109 vs my version 109 (ok) -2346> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 start_election -2345> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 _reset -2344> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 cancel_probe_timeout 0x55f7810feb20 -2343> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 timecheck_finish -2342> 2018-04-26 19:35:20.148 7faee920d700 15 mon.a@1(probing) e1 health_tick_stop -2341> 2018-04-26 19:35:20.148 7faee920d700 15 mon.a@1(probing) e1 health_interval_stop -2340> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 scrub_event_cancel -2339> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing) e1 scrub_reset -2338> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxos(paxos recovering c 1..109) restart -- canceling timeouts -2337> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(mdsmap 1..7) restart -2336> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(osdmap 1..17) restart -2335> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(logm 1..28) restart -2334> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(monmap 1..1) restart -2333> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(auth 1..8) restart -2332> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(mgr 1..4) restart -2331> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(mgrstat 1..52) restart -2330> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(health 1..1) restart -2329> 2018-04-26 19:35:20.148 7faee920d700 10 mon.a@1(probing).paxosservice(config 0..0) restart -2328> 2018-04-26 19:35:20.148 7faee920d700 0 log_channel(cluster) log [INF] : mon.a calling monitor election ...
/a/sage-2018-04-26_19:17:26-upgrade:luminous-x-wip-sage-testing-2018-04-26-1251-distro-basic-smithi/2442438
Related issues
History
#1 Updated by Radoslaw Zarzynski almost 6 years ago
- Assignee set to Radoslaw Zarzynski
#2 Updated by Sage Weil almost 6 years ago
- Priority changed from Urgent to High
#3 Updated by Radoslaw Zarzynski almost 6 years ago
- Related to Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed. added
#4 Updated by Greg Farnum over 4 years ago
- Status changed from 12 to Can't reproduce
Believe we've made some fixes to OpHistory since April last year...