Actions
Bug #24982
closedmgr: terminate called after throwing an instance of 'std::out_of_range' in DaemonPerfCounters::update
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Backtrace from logs:
2018-07-18 14:22:49.241346 7fc045459700 20 mgr.server handle_report updating existing DaemonState for rgw,bucket 2018-07-18 14:22:49.241349 7fc045459700 20 mgr update loading 0 new types, 0 old types, had 146 types, got 214 bytes of data 2018-07-18 14:22:49.242640 7fc045459700 -1 *** Caught signal (Aborted) ** in thread 7fc045459700 thread_name:ms_dispatch ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) 1: (()+0x40e744) [0x560798ff2744] 2: (()+0x11390) [0x7fc053a13390] 3: (gsignal()+0x38) [0x7fc0529a3428] 4: (abort()+0x16a) [0x7fc0529a502a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fc0532e684d] 6: (()+0x8d6b6) [0x7fc0532e46b6] 7: (()+0x8d701) [0x7fc0532e4701] 8: (()+0x8d919) [0x7fc0532e4919] 9: (std::__throw_out_of_range(char const*)+0x3f) [0x7fc05330d2cf] 10: (DaemonPerfCounters::update(MMgrReport*)+0x197c) [0x560798e86dec] 11: (DaemonServer::handle_report(MMgrReport*)+0x269) [0x560798e8f3d9] 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x560798e9d5a7] 13: (DispatchQueue::entry()+0xf4a) [0x56079934caba] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x5607990edaed] 15: (()+0x76ba) [0x7fc053a096ba] 16: (clone()+0x6d) [0x7fc052a7541d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Backtrace from stdout:
terminate called after throwing an instance of 'std::out_of_range' what(): map::at *** Caught signal (Aborted) ** in thread 7fbb9de22700 thread_name:ms_dispatch ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) 1: (()+0x40e744) [0x564115e80744] 2: (()+0x11390) [0x7fbbac51b390] 3: (gsignal()+0x38) [0x7fbbab4ab428] 4: (abort()+0x16a) [0x7fbbab4ad02a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fbbabdee84d] 6: (()+0x8d6b6) [0x7fbbabdec6b6] 7: (()+0x8d701) [0x7fbbabdec701] 8: (()+0x8d919) [0x7fbbabdec919] 9: (std::__throw_out_of_range(char const*)+0x3f) [0x7fbbabe152cf] 10: (DaemonPerfCounters::update(MMgrReport*)+0x197c) [0x564115d14dec] 11: (DaemonServer::handle_report(MMgrReport*)+0x269) [0x564115d1d3d9] 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x564115d2b5a7] 13: (DispatchQueue::entry()+0xf4a) [0x5641161daaba] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x564115f7baed] 15: (()+0x76ba) [0x7fbbac5116ba] 16: (clone()+0x6d) [0x7fbbab57d41d] 2018-07-18 15:37:51.827425 7fbb9de22700 -1 *** Caught signal (Aborted) ** in thread 7fbb9de22700 thread_name:ms_dispatch ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) 1: (()+0x40e744) [0x564115e80744] 2: (()+0x11390) [0x7fbbac51b390] 3: (gsignal()+0x38) [0x7fbbab4ab428] 4: (abort()+0x16a) [0x7fbbab4ad02a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fbbabdee84d] 6: (()+0x8d6b6) [0x7fbbabdec6b6] 7: (()+0x8d701) [0x7fbbabdec701] 8: (()+0x8d919) [0x7fbbabdec919] 9: (std::__throw_out_of_range(char const*)+0x3f) [0x7fbbabe152cf] 10: (DaemonPerfCounters::update(MMgrReport*)+0x197c) [0x564115d14dec] 11: (DaemonServer::handle_report(MMgrReport*)+0x269) [0x564115d1d3d9] 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x564115d2b5a7] 13: (DispatchQueue::entry()+0xf4a) [0x5641161daaba] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x564115f7baed] 15: (()+0x76ba) [0x7fbbac5116ba] 16: (clone()+0x6d) [0x7fbbab57d41d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2018-07-18 15:37:51.827425 7fbb9de22700 -1 *** Caught signal (Aborted) ** in thread 7fbb9de22700 thread_name:ms_dispatch ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) 1: (()+0x40e744) [0x564115e80744] 2: (()+0x11390) [0x7fbbac51b390] 3: (gsignal()+0x38) [0x7fbbab4ab428] 4: (abort()+0x16a) [0x7fbbab4ad02a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fbbabdee84d] 6: (()+0x8d6b6) [0x7fbbabdec6b6] 7: (()+0x8d701) [0x7fbbabdec701] 8: (()+0x8d919) [0x7fbbabdec919] 9: (std::__throw_out_of_range(char const*)+0x3f) [0x7fbbabe152cf] 10: (DaemonPerfCounters::update(MMgrReport*)+0x197c) [0x564115d14dec] 11: (DaemonServer::handle_report(MMgrReport*)+0x269) [0x564115d1d3d9] 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x564115d2b5a7] 13: (DispatchQueue::entry()+0xf4a) [0x5641161daaba] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x564115f7baed] 15: (()+0x76ba) [0x7fbbac5116ba] 16: (clone()+0x6d) [0x7fbbab57d41d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Aborted
This patch introduces the use of `map::at`:
Notes on diagnosing the issue on IRC:
- It'd only be triggered when the perf counter being updated was not 'declared' and thus created before being updated
- The MGRs that fail must have got into a state where the mgr thinks some perf counters being updated were never declared by the osds/rgw, while the others either did declare those perf counters or don't have any updates for them
- Mgrs only seem to crash only when a perf counter update comes from radosgw.
Actions