Bug #49666
openRGW crash due to PerfCounters::inc assert_condition during multisite syncing
0%
0bc60ae023b522915e327c8b597473a08dcebacd4919ab95d324734af6beb5f9
1a9a29cab818bf3ddb73afdd5fd0e12722532f58e2c30b6cd41009ee5dff8bd8
2bf1b3e02038e06d50abb448410d2c59001d10861a18e5c7cf1f3e8c1926b924
522618d0d09f6b8be5a4359dc5a3fd1a6a0fdc91222dbfafaa0fb64fbb451f4d
71c45779d7a35eb1c64c0b0fc55117d7dfe56010108a4e4558caa8b1fb50b130
7d6ca6057edf55e9e3dea0fd7cdcd6e4f11f13c4a5d00a883206d07a1e5fdae0
96fa452c3a27b0d721d4bcb9ea8bcde48f991b6458114a70aa5f815230a8c5b4
38def02c08847ca40126dcb976325e4ac3f145ce853aba51ac5f9fc21fc3ed23
fe60b48bad2cba6f3a9fa97c51ff29e211121819027b6d56b542ce49db14d06c
Description
ceph crash info 2021-03-04T07:48:01.822498Z_df599769-9947-476c-8ece-11f450d8c09f
{
"assert_condition": "idx > m_lower_bound",
"assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.9/rpm/el8/BUILD/ceph-15.2.9/src/common/perf_counters.cc",
"assert_func": "void ceph::common::PerfCounters::inc(int, uint64_t)",
"assert_line": 164,
"assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.9/rpm/el8/BUILD/ceph-15.2.9/src/common/perf_counters.cc: In function 'void ceph::common::PerfCounters::inc(int, uint64_t)' thread 7f77ad275700 time 2021-03-04T07:48:01.817568+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.9/rpm/el8/BUILD/ceph-15.2.9/src/common/perf_counters.cc: 164: FAILED ceph_assert(idx > m_lower_bound)\n",
"assert_thread_name": "rados_async",
"backtrace": [
"(()+0x12b20) [0x7f77c9290b20]",
"(gsignal()+0x10f) [0x7f77c78d57ff]",
"(abort()+0x127) [0x7f77c78bfc35]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f77c9e4382b]",
"(()+0x27a9f4) [0x7f77c9e439f4]",
"(()+0x465c3f) [0x7f77ca02ec3f]",
"(RGWAsyncFetchRemoteObj::_send_request()+0x3bc) [0x7f77d42b09cc]",
"(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x24) [0x7f77d42ab114]",
"(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x11) [0x7f77d42b2a31]",
"(ThreadPool::worker(ThreadPool::WorkThread*)+0xe64) [0x7f77c9f30004]",
"(ThreadPool::WorkThread::entry()+0x15) [0x7f77c9f30865]",
"(()+0x814a) [0x7f77c928614a]",
"(clone()+0x43) [0x7f77c799af23]"
],
"ceph_version": "15.2.9",
"crash_id": "2021-03-04T07:48:01.822498Z_df599769-9947-476c-8ece-11f450d8c09f",
"entity_name": "client.rgw.realm_test.zone_first.ov-dapobject-02-3.jsjcum",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "8",
"os_version_id": "8",
"process_name": "radosgw",
"stack_sig": "0bc60ae023b522915e327c8b597473a08dcebacd4919ab95d324734af6beb5f9",
"timestamp": "2021-03-04T07:48:01.822498Z",
"utsname_hostname": "ov-dapobject-02-3",
"utsname_machine": "x86_64",
"utsname_release": "4.18.0-193.6.3.el8_2.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Wed Jun 10 11:09:32 UTC 2020"
}
Files
Updated by Telemetry Bot almost 3 years ago
- Crash signature (v1) updated (diff)
- Crash signature (v2) updated (diff)
- Affected Versions v15.2.13, v15.2.8 added
Assert condition: idx > m_lower_bound
Assert function: void ceph::common::PerfCounters::inc(int, uint64_t)
Sanitized backtrace:
RGWAsyncFetchRemoteObj::_send_request() RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*) RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&) ThreadPool::worker(ThreadPool::WorkThread*) ThreadPool::WorkThread::entry() clone()
Crash dump sample:
{ "assert_condition": "idx > m_lower_bound", "assert_file": "common/perf_counters.cc", "assert_func": "void ceph::common::PerfCounters::inc(int, uint64_t)", "assert_line": 164, "assert_msg": "common/perf_counters.cc: In function 'void ceph::common::PerfCounters::inc(int, uint64_t)' thread 7fdc2ca57700 time 2021-07-02T12:44:41.725322+0200\ncommon/perf_counters.cc: 164: FAILED ceph_assert(idx > m_lower_bound)", "assert_thread_name": "rados_async", "backtrace": [ "(()+0x12b30) [0x7fdc44025b30]", "(gsignal()+0x10f) [0x7fdc4266337f]", "(abort()+0x127) [0x7fdc4264ddb5]", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7fdc44bd8d61]", "(()+0x27af2a) [0x7fdc44bd8f2a]", "(()+0x46724f) [0x7fdc44dc524f]", "(RGWAsyncFetchRemoteObj::_send_request()+0x3bc) [0x7fdc4f04859c]", "(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x24) [0x7fdc4f042a74]", "(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x11) [0x7fdc4f04a5a1]", "(ThreadPool::worker(ThreadPool::WorkThread*)+0xe64) [0x7fdc44cc5d14]", "(ThreadPool::WorkThread::entry()+0x15) [0x7fdc44cc6575]", "(()+0x815a) [0x7fdc4401b15a]", "(clone()+0x43) [0x7fdc42728dd3]" ], "ceph_version": "15.2.13", "crash_id": "2021-07-02T10:44:41.729757Z_3f04b2b4-5234-4d26-bb53-87bc28ad73ae", "entity_name": "client.9f5ba328f57e893aca80108d5e05c226d0071626", "os_id": "ol", "os_name": "Oracle Linux Server", "os_version": "8.4", "os_version_id": "8.4", "process_name": "radosgw", "stack_sig": "1a9a29cab818bf3ddb73afdd5fd0e12722532f58e2c30b6cd41009ee5dff8bd8", "timestamp": "2021-07-02T10:44:41.729757Z", "utsname_machine": "x86_64", "utsname_release": "5.4.17-2102.202.5.el8uek.x86_64", "utsname_sysname": "Linux", "utsname_version": "#2 SMP Sat May 22 16:16:03 PDT 2021" }
Updated by Christian Rohmann over 2 years ago
- File radosgw_crashes.dump radosgw_crashes.dump added
Setting up multisite on a former single sited RADOSGW setup / cluster we observed multiple RADOSGW crashes as well.
See the attached dumps of those crashes.
Updated by Christian Rohmann over 2 years ago
The issue appeared again around the time the machine was rebooed
# ceph crash info 2022-02-01T08:29:35.173777Z_1d4fc1eb-9f33-416f-a36d-1d335baaff27 { "backtrace": [ "(()+0x46210) [0x7f3522309210]", "(ceph::common::PerfCounters::inc(int, unsigned long)+0x7) [0x7f351972c8b7]", "(RGWAsyncFetchRemoteObj::_send_request()+0x574) [0x7f3522cd6724]", "(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25) [0x7f3522cd0865]", "(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x11) [0x7f3522cd8b61]", "(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb) [0x7f351961d1bb]", "(ThreadPool::WorkThread::entry()+0x15) [0x7f351961e285]", "(()+0x9609) [0x7f3519179609]", "(clone()+0x43) [0x7f35223e5293]" ], "ceph_version": "15.2.15", "crash_id": "2022-02-01T08:29:35.173777Z_1d4fc1eb-9f33-416f-a36d-1d335baaff27", "entity_name": "client.rgw.redacted", "os_id": "ubuntu", "os_name": "Ubuntu", "os_version": "20.04.3 LTS (Focal Fossa)", "os_version_id": "20.04", "process_name": "radosgw", "stack_sig": "2bf1b3e02038e06d50abb448410d2c59001d10861a18e5c7cf1f3e8c1926b924", "timestamp": "2022-02-01T08:29:35.173777Z", "utsname_hostname": "REDACTED", "utsname_machine": "x86_64", "utsname_release": "5.13.0-28-generic", "utsname_sysname": "Linux", "utsname_version": "#31~20.04.1-Ubuntu SMP Wed Jan 19 14:08:10 UTC 2022" }
Updated by Christian Rohmann over 2 years ago
Christian Rohmann wrote:
The issue appeared again around the time the machine was rebooed
[...]
Most likely during the stop of the RADOSGW.
Updated by Telemetry Bot about 2 years ago
Updated by Telemetry Bot about 2 years ago
Updated by Casey Bodley over 1 year ago
- Has duplicate Bug #56832: crash: ceph::common::PerfCounters::inc(int, unsigned long) added
Updated by Casey Bodley over 1 year ago
- Has duplicate Bug #51919: crash: ceph::common::PerfCounters::inc(int, unsigned long) (in RGWAsyncFetchRemoteObj::_send_request()) added
Updated by J. Eric Ivancich over 1 year ago
Updated by Casey Bodley over 1 year ago
- Status changed from Resolved to Pending Backport
Updated by Backport Bot over 1 year ago
- Copied to Backport #57635: pacific: RGW crash due to PerfCounters::inc assert_condition during multisite syncing added
Updated by Backport Bot over 1 year ago
- Copied to Backport #57636: quincy: RGW crash due to PerfCounters::inc assert_condition during multisite syncing added