Project

General

Profile

Actions

Bug #24982

closed

mgr: terminate called after throwing an instance of 'std::out_of_range' in DaemonPerfCounters::update

Added by Iain Bucław almost 6 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Backtrace from logs:

2018-07-18 14:22:49.241346 7fc045459700 20 mgr.server handle_report updating existing DaemonState for rgw,bucket
2018-07-18 14:22:49.241349 7fc045459700 20 mgr update loading 0 new types, 0 old types, had 146 types, got 214 bytes of data
2018-07-18 14:22:49.242640 7fc045459700 -1 *** Caught signal (Aborted) **
 in thread 7fc045459700 thread_name:ms_dispatch

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0x40e744) [0x560798ff2744]
 2: (()+0x11390) [0x7fc053a13390]
 3: (gsignal()+0x38) [0x7fc0529a3428]
 4: (abort()+0x16a) [0x7fc0529a502a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fc0532e684d]
 6: (()+0x8d6b6) [0x7fc0532e46b6]
 7: (()+0x8d701) [0x7fc0532e4701]
 8: (()+0x8d919) [0x7fc0532e4919]
 9: (std::__throw_out_of_range(char const*)+0x3f) [0x7fc05330d2cf]
 10: (DaemonPerfCounters::update(MMgrReport*)+0x197c) [0x560798e86dec]
 11: (DaemonServer::handle_report(MMgrReport*)+0x269) [0x560798e8f3d9]
 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x560798e9d5a7]
 13: (DispatchQueue::entry()+0xf4a) [0x56079934caba]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x5607990edaed]
 15: (()+0x76ba) [0x7fc053a096ba]
 16: (clone()+0x6d) [0x7fc052a7541d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Backtrace from stdout:

terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
*** Caught signal (Aborted) **
 in thread 7fbb9de22700 thread_name:ms_dispatch
 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0x40e744) [0x564115e80744]
 2: (()+0x11390) [0x7fbbac51b390]
 3: (gsignal()+0x38) [0x7fbbab4ab428]
 4: (abort()+0x16a) [0x7fbbab4ad02a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fbbabdee84d]
 6: (()+0x8d6b6) [0x7fbbabdec6b6]
 7: (()+0x8d701) [0x7fbbabdec701]
 8: (()+0x8d919) [0x7fbbabdec919]
 9: (std::__throw_out_of_range(char const*)+0x3f) [0x7fbbabe152cf]
 10: (DaemonPerfCounters::update(MMgrReport*)+0x197c) [0x564115d14dec]
 11: (DaemonServer::handle_report(MMgrReport*)+0x269) [0x564115d1d3d9]
 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x564115d2b5a7]
 13: (DispatchQueue::entry()+0xf4a) [0x5641161daaba]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x564115f7baed]
 15: (()+0x76ba) [0x7fbbac5116ba]
 16: (clone()+0x6d) [0x7fbbab57d41d]
2018-07-18 15:37:51.827425 7fbb9de22700 -1 *** Caught signal (Aborted) **
 in thread 7fbb9de22700 thread_name:ms_dispatch

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0x40e744) [0x564115e80744]
 2: (()+0x11390) [0x7fbbac51b390]
 3: (gsignal()+0x38) [0x7fbbab4ab428]
 4: (abort()+0x16a) [0x7fbbab4ad02a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fbbabdee84d]
 6: (()+0x8d6b6) [0x7fbbabdec6b6]
 7: (()+0x8d701) [0x7fbbabdec701]
 8: (()+0x8d919) [0x7fbbabdec919]
 9: (std::__throw_out_of_range(char const*)+0x3f) [0x7fbbabe152cf]
 10: (DaemonPerfCounters::update(MMgrReport*)+0x197c) [0x564115d14dec]
 11: (DaemonServer::handle_report(MMgrReport*)+0x269) [0x564115d1d3d9]
 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x564115d2b5a7]
 13: (DispatchQueue::entry()+0xf4a) [0x5641161daaba]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x564115f7baed]
 15: (()+0x76ba) [0x7fbbac5116ba]
 16: (clone()+0x6d) [0x7fbbab57d41d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2018-07-18 15:37:51.827425 7fbb9de22700 -1 *** Caught signal (Aborted) **
 in thread 7fbb9de22700 thread_name:ms_dispatch

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0x40e744) [0x564115e80744]
 2: (()+0x11390) [0x7fbbac51b390]
 3: (gsignal()+0x38) [0x7fbbab4ab428]
 4: (abort()+0x16a) [0x7fbbab4ad02a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fbbabdee84d]
 6: (()+0x8d6b6) [0x7fbbabdec6b6]
 7: (()+0x8d701) [0x7fbbabdec701]
 8: (()+0x8d919) [0x7fbbabdec919]
 9: (std::__throw_out_of_range(char const*)+0x3f) [0x7fbbabe152cf]
 10: (DaemonPerfCounters::update(MMgrReport*)+0x197c) [0x564115d14dec]
 11: (DaemonServer::handle_report(MMgrReport*)+0x269) [0x564115d1d3d9]
 12: (DaemonServer::ms_dispatch(Message*)+0x47) [0x564115d2b5a7]
 13: (DispatchQueue::entry()+0xf4a) [0x5641161daaba]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x564115f7baed]
 15: (()+0x76ba) [0x7fbbac5116ba]
 16: (clone()+0x6d) [0x7fbbab57d41d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Aborted

This patch introduces the use of `map::at`:

https://github.com/ceph/ceph/commit/1164ef2f32d81d4f35623c3f6a77af2b6871f962#diff-1d4ae230c3c43537437b704c5d05a40cR167

Notes on diagnosing the issue on IRC:

  • It'd only be triggered when the perf counter being updated was not 'declared' and thus created before being updated
  • The MGRs that fail must have got into a state where the mgr thinks some perf counters being updated were never declared by the osds/rgw, while the others either did declare those perf counters or don't have any updates for them
  • Mgrs only seem to crash only when a perf counter update comes from radosgw.

Related issues 1 (0 open1 closed)

Related to mgr - Bug #36244: mgr crash when handle_report updating existing DaemonState for rgwResolvedMykola Golub09/28/2018

Actions
Actions

Also available in: Atom PDF