Project

General

Profile

Actions

Bug #49302

closed

Huge amount of RGW crashes in the multisite setup with a backtrace

Added by Ist Gab about 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Multisite, sync, rgw
Backport:
octopus pacific quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

I have in my 3 Datacenter multisite setup altogether 110 RGW crashes with the following information:

{
    "backtrace": [
        "(()+0x12dd0) [0x7fb9b6defdd0]",
        "(RGWCoroutine::set_sleeping(bool)+0x10) [0x7fb9c1d274d0]",
        "(RGWOmapAppend::flush_pending()+0x4a) [0x7fb9c1d2d3da]",
        "(RGWOmapAppend::finish()+0x14) [0x7fb9c1d2d4d4]",
        "(RGWDataSyncShardCR::stop_spawned_services()+0x2f) [0x7fb9c1c6c9df]",
        "(RGWDataSyncShardCR::incremental_sync()+0x771) [0x7fb9c1c845d1]",
        "(RGWDataSyncShardCR::operate()+0x9d) [0x7fb9c1c87cdd]",
        "(RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x67) [0x7fb9c1d27ac7]",
        "(RGWCoroutinesManager::run(std::__cxx11::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x271) [0x7fb9c1d288f1]",
        "(RGWCoroutinesManager::run(RGWCoroutine*)+0x8b) [0x7fb9c1d29b5b]",
        "(RGWRemoteDataLog::run_sync(int)+0x1ad) [0x7fb9c1c605bd]",
        "(RGWDataSyncProcessorThread::process()+0x46) [0x7fb9c1df2226]",
        "(RGWRadosThread::Worker::entry()+0x176) [0x7fb9c1dbab86]",
        "(()+0x82de) [0x7fb9b6de52de]",
        "(clone()+0x43) [0x7fb9b54fbe83]" 
    ],
    "ceph_version": "15.2.7",
    "crash_id": "2021-02-15T09:44:29.206441Z_ac2988b1-57af-485e-8a76-99e08d017bff",
    "entity_name": "client.rgw.hk-cephmon-2s01.rgw0",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8 (Core)",
    "os_version_id": "8",
    "process_name": "radosgw",
    "stack_sig": "8f62d50897d7b1b190387523f6d687e60dbef4e6746b430310d721c5a558f3b5",
    "timestamp": "2021-02-15T09:44:29.206441Z",
    "utsname_hostname": "hk-cephmon-2s01",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-193.28.1.el8_2.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Thu Oct 22 00:20:22 UTC 2020" 
}

The sync mechanism suffering not sure is it a bug or some setup issue?


Related issues 4 (1 open3 closed)

Has duplicate rgw - Bug #56920: crash: RGWCoroutinesStack::wakeup()Pending Backport

Actions
Copied to rgw - Backport #55457: pacific: Huge amount of RGW crashes in the multisite setup with a backtrace ResolvedActions
Copied to rgw - Backport #55458: quincy: Huge amount of RGW crashes in the multisite setup with a backtrace ResolvedActions
Copied to rgw - Backport #55459: octopus: Huge amount of RGW crashes in the multisite setup with a backtrace ResolvedActions
Actions #1

Updated by Mule Te about 3 years ago

I have the same issue here. Another problem I notice is old objects will not be synced to secondary zone. :(

Actions #2

Updated by Ist Gab about 3 years ago

Mule Te wrote:

I have the same issue here. Another problem I notice is old objects will not be synced to secondary zone. :(

Yes, I have that one too.

Actions #3

Updated by Sage Weil almost 3 years ago

  • Project changed from Ceph to rgw
Actions #4

Updated by Casey Bodley almost 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 46007
Actions #5

Updated by Casey Bodley almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to octopus pacific quincy
Actions #6

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #55457: pacific: Huge amount of RGW crashes in the multisite setup with a backtrace added
Actions #7

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #55458: quincy: Huge amount of RGW crashes in the multisite setup with a backtrace added
Actions #8

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #55459: octopus: Huge amount of RGW crashes in the multisite setup with a backtrace added
Actions #9

Updated by Casey Bodley over 1 year ago

  • Has duplicate Bug #56920: crash: RGWCoroutinesStack::wakeup() added
Actions #10

Updated by Backport Bot over 1 year ago

  • Tags changed from Multisite, sync, rgw to Multisite, sync, rgw backport_processed
Actions #11

Updated by Konstantin Shalygin over 1 year ago

  • Status changed from Pending Backport to Resolved
  • Tags changed from Multisite, sync, rgw backport_processed to Multisite, sync, rgw
Actions

Also available in: Atom PDF