Project

General

Profile

Bug #49302

Huge amount of RGW crashes in the multisite setup with a backtrace

Added by Ist Gab 10 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Multisite, sync, rgw
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Hi,

I have in my 3 Datacenter multisite setup altogether 110 RGW crashes with the following information:

{
    "backtrace": [
        "(()+0x12dd0) [0x7fb9b6defdd0]",
        "(RGWCoroutine::set_sleeping(bool)+0x10) [0x7fb9c1d274d0]",
        "(RGWOmapAppend::flush_pending()+0x4a) [0x7fb9c1d2d3da]",
        "(RGWOmapAppend::finish()+0x14) [0x7fb9c1d2d4d4]",
        "(RGWDataSyncShardCR::stop_spawned_services()+0x2f) [0x7fb9c1c6c9df]",
        "(RGWDataSyncShardCR::incremental_sync()+0x771) [0x7fb9c1c845d1]",
        "(RGWDataSyncShardCR::operate()+0x9d) [0x7fb9c1c87cdd]",
        "(RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x67) [0x7fb9c1d27ac7]",
        "(RGWCoroutinesManager::run(std::__cxx11::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x271) [0x7fb9c1d288f1]",
        "(RGWCoroutinesManager::run(RGWCoroutine*)+0x8b) [0x7fb9c1d29b5b]",
        "(RGWRemoteDataLog::run_sync(int)+0x1ad) [0x7fb9c1c605bd]",
        "(RGWDataSyncProcessorThread::process()+0x46) [0x7fb9c1df2226]",
        "(RGWRadosThread::Worker::entry()+0x176) [0x7fb9c1dbab86]",
        "(()+0x82de) [0x7fb9b6de52de]",
        "(clone()+0x43) [0x7fb9b54fbe83]" 
    ],
    "ceph_version": "15.2.7",
    "crash_id": "2021-02-15T09:44:29.206441Z_ac2988b1-57af-485e-8a76-99e08d017bff",
    "entity_name": "client.rgw.hk-cephmon-2s01.rgw0",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8 (Core)",
    "os_version_id": "8",
    "process_name": "radosgw",
    "stack_sig": "8f62d50897d7b1b190387523f6d687e60dbef4e6746b430310d721c5a558f3b5",
    "timestamp": "2021-02-15T09:44:29.206441Z",
    "utsname_hostname": "hk-cephmon-2s01",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-193.28.1.el8_2.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Thu Oct 22 00:20:22 UTC 2020" 
}

The sync mechanism suffering not sure is it a bug or some setup issue?

Also available in: Atom PDF