Project

General

Profile

Bug #56832

crash: ceph::common::PerfCounters::inc(int, unsigned long)

Added by Telemetry Bot over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Telemetry
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):

38def02c08847ca40126dcb976325e4ac3f145ce853aba51ac5f9fc21fc3ed23


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=1826ab46c7fb00f8343557593baebf8cae99e968c05ce238b3e7d2f267bfd163

Sanitized backtrace:

    ceph::common::PerfCounters::inc(int, unsigned long)
    RGWAsyncFetchRemoteObj::_send_request(DoutPrefixProvider const*)
    RGWAsyncRadosProcessor::handle_request(DoutPrefixProvider const*, RGWAsyncRadosRequest*)
    ThreadPool::WorkQueue<RGWAsyncRadosRequest>::_void_process(void*, ThreadPool::TPHandle&)
    ThreadPool::worker(ThreadPool::WorkThread*)
    ThreadPool::WorkThread::entry()

Crash dump sample:
{
    "archived": "2022-07-04 07:40:44.041426",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libc.so.6(+0x3bd60) [0x7f95f258fd60]",
        "(ceph::common::PerfCounters::inc(int, unsigned long)+0x3) [0x7f95e98afd33]",
        "(RGWAsyncFetchRemoteObj::_send_request(DoutPrefixProvider const*)+0xdb0) [0x7f95f2f39580]",
        "(RGWAsyncRadosProcessor::handle_request(DoutPrefixProvider const*, RGWAsyncRadosRequest*)+0x27) [0x7f95f2f32097]",
        "(ThreadPool::WorkQueue<RGWAsyncRadosRequest>::_void_process(void*, ThreadPool::TPHandle&)+0x2a) [0x7f95f2f3ea0a]",
        "(ThreadPool::worker(ThreadPool::WorkThread*)+0x96b) [0x7f95e97a893b]",
        "(ThreadPool::WorkThread::entry()+0x11) [0x7f95e97a9981]",
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f95e8d00ea7]",
        "clone()" 
    ],
    "ceph_version": "16.2.9",
    "crash_id": "2022-07-04T07:15:58.305826Z_d6511c54-f86a-4734-b178-c46b78c23078",
    "entity_name": "client.00fd1cbffeab355cc42e9e45df93a6bdca2e9743",
    "os_id": "11",
    "os_name": "Debian GNU/Linux 11 (bullseye)",
    "os_version": "11 (bullseye)",
    "os_version_id": "11",
    "process_name": "radosgw",
    "stack_sig": "38def02c08847ca40126dcb976325e4ac3f145ce853aba51ac5f9fc21fc3ed23",
    "timestamp": "2022-07-04T07:15:58.305826Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.35-3-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200)" 
}


Related issues

Duplicates rgw - Bug #49666: RGW crash due to PerfCounters::inc assert_condition during multisite syncing Pending Backport

History

#1 Updated by Telemetry Bot over 1 year ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.2.9 added

#2 Updated by Casey Bodley over 1 year ago

  • Duplicates Bug #49666: RGW crash due to PerfCounters::inc assert_condition during multisite syncing added

#3 Updated by Mark Kogan over 1 year ago

occurred in the vicinity of `rgw realm reloader` occurrence:

{
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12ce0) [0x7f521f745ce0]",
        "(ceph::common::PerfCounters::inc(int, unsigned long)+0x7) [0x7f52202cbd47]",
        "(RGWAsyncFetchRemoteObj::_send_request(DoutPrefixProvider const*)+0xa35) [0x7f522a681c05]",
        "(RGWAsyncRadosProcessor::handle_request(DoutPrefixProvider const*, RGWAsyncRadosRequest*)+0x2a) [0x7f522a678b1a]",
        "(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x17) [0x7f522a6822b7]",
        "(ThreadPool::worker(ThreadPool::WorkThread*)+0xd48) [0x7f52201bc7e8]",
        "(ThreadPool::WorkThread::entry()+0x15) [0x7f52201bd3e5]",
        "/lib64/libpthread.so.0(+0x81cf) [0x7f521f73b1cf]",
        "clone()" 
    ],
    "ceph_version": "16.2.10-37.el8cp",
    "crash_id": "2022-09-07T09:02:21.261415Z_b0ef1abc-b760-43d3-9cf1-3d7bdb7bfab4",
    "entity_name": "client.rgw.mero008.india.primary.8080.mero008.oahukv",
    "os_id": "rhel",
    "os_name": "Red Hat Enterprise Linux",
    "os_version": "8.6 (Ootpa)",
    "os_version_id": "8.6",
    "process_name": "radosgw",
    "stack_sig": "3215045e110d75d184a1dfe01da0e992403a09f7ac17fd0045f1cab6b2ed090f",
    "timestamp": "2022-09-07T09:02:21.261415Z",
    "utsname_hostname": "mero008",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-372.13.1.el8_6.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Mon Jun 6 15:05:22 EDT 2022" 
}
segfaults can be observed occurring 2 times in mero008's /var/log/messages at 09:02:21 (discussed above) and 13:49:46 

in both cases observed `rgw realm reloader` log occurring shortly before the segfault:
Sep  7 09:02:13 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[234509]: debug 2022-09-07T09:02:13.070+0000 7f50e72ea700  1  : Frontends paused
                                        ^^^^^^^^^^^^^^^^
...    ^v^v^v^v
Sep  7 09:02:21 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[234509]: debug 2022-09-07T09:02:21.260+0000 7f5208d2d700 -1 *** Caught signal (Segmentation fault) **

AND

Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: debug 2022-09-07T13:49:46.557+0000 7f7de32ba700  0 int RGWRESTStreamRWRequest::complete_request(optional_yield, std::__cxx11::string*, ceph::real_time*, uint64_t*, std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*):GET_1662558586546797164_http://10.8.129.240:8080: wait failed with ret=-2016
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: debug 2022-09-07T13:49:46.561+0000 7f7cc2879700  1 rgw realm reloader: Pausing frontends for realm update...
                                             ^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: debug 2022-09-07T13:49:46.561+0000 7f7cc2879700  1 rgw realm reloader: Frontends paused
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: debug 2022-09-07T13:49:46.563+0000 7f7dd0294700  0 rgw rados thread: ERROR: failed to run sync
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: debug 2022-09-07T13:49:46.584+0000 7f7de2ab9700  0 int RGWRESTStreamRWRequest::complete_request(optional_yield, std::__cxx11::string*, ceph::real_time*, uint64_t*, std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*):GET_1662558586203707694_http://10.8.129.232:8080: wait failed with ret=-2016

Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: *** Caught signal (Segmentation fault) **
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: in thread 7f7de2ab9700 thread_name:rados_async
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: debug 2022-09-07T13:49:46.584+0000 7f7de0ab5700  0 int RGWRESTStreamRWRequest::complete_request(optional_yield, std::__cxx11::string*, ceph::real_time*, uint64_t*, std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*):GET_1662558584239943934_http://10.8.129.232:8080: wait failed with ret=-2016
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: ceph version 16.2.10-37.el8cp (cd069f2d37ee94d03baa34c53f21a430bd31864b) pacific (stable)
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f7dfacd4ce0]
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 2: (ceph::common::PerfCounters::inc(int, unsigned long)+0x7) [0x7f7dfb85ad47]
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 3: (RGWAsyncFetchRemoteObj::_send_request(DoutPrefixProvider const*)+0xae7) [0x7f7e05c10cb7]
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 4: (RGWAsyncRadosProcessor::handle_request(DoutPrefixProvider const*, RGWAsyncRadosRequest*)+0x2a) [0x7f7e05c07b1a]
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 5: (RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x17) [0x7f7e05c112b7]
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0xd48) [0x7f7dfb74b7e8]
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 7: (ThreadPool::WorkThread::entry()+0x15) [0x7f7dfb74c3e5]
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 8: /lib64/libpthread.so.0(+0x81cf) [0x7f7dfacca1cf]
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: 9: clone()
Sep  7 13:49:46 mero008 ceph-ee90805b-3a7f-46bb-a322-dba44141d8f5-rgw-mero008-india-primary-8080-mero008-oahukv[251114]: debug 2022-09-07T13:49:46.587+0000 7f7de2ab9700 -1 *** Caught signal (Segmentation fault) **

#4 Updated by Casey Bodley over 1 year ago

  • Status changed from New to Triaged

#5 Updated by J. Eric Ivancich over 1 year ago

  • Status changed from Triaged to Resolved

Also available in: Atom PDF