Project

General

Profile

Actions

Bug #38479

closed

multisite: memory growth from RGWCoroutinesStacks on lease errors

Added by Casey Bodley about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
multisite
Backport:
luminous mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When multiple gateways are running in the same zone of a multisite configuration, they use leases to coordinate with each other on sync. While one gateway holds a lease, other gateways continue to poll until it becomes available. Each failed poll attempt leaves behind a spawned coroutine stack, causing memory growth that doesn't get cleaned up until shutdown.

~/ceph/build $ bin/ceph daemon run/c2/out/radosgw.8002.asok cr dump | jq .coroutine_managers[0].run_contexts[0].entries[1].ops[0]
{                                                                                                                                                            
  "type": "25RGWDataSyncShardControlCR",
  "spawned": [
    "0x55ab5bdb7860",
    "0x55ab5c258780",
    "0x55ab5c3565a0",
    "0x55ab5c357680",
    "0x55ab5c357b30",
    "0x55ab5c364e10",
    "0x55ab5c3650e0",
    "0x55ab5c3652c0",
    "0x55ab5c3654a0",
    "0x55ab5c422e10",
    "0x55ab5c4231d0",
    "0x55ab5c4234a0",
    "0x55ab5c423770",
    "0x55ab5c4914a0",
    "0x55ab5c491770",
    "0x55ab5c491a40",
    "0x55ab5c491d10",
    "0x55ab5c5013b0",
    "0x55ab5c501680",
    "0x55ab5c501950",
    "0x55ab5c501c20",
    "0x55ab5c501ef0",
    "0x55ab5c57bd10",
    "0x55ab5b734960",
    "0x55ab5c5ec5a0",
    "0x55ab5c5ec690",
    "0x55ab5c5ec780",
    "0x55ab5c5ec870",
    "0x55ab5c5ec960",
    "0x55ab5c5eca50",
    "0x55ab5c5ecb40",
    "0x55ab5c5ecc30" 
  ]
}

https://github.com/ceph/ceph/pull/26639


Related issues 2 (0 open2 closed)

Copied to rgw - Backport #38529: luminous: multisite: memory growth from RGWCoroutinesStacks on lease errorsResolvedPrashant DActions
Copied to rgw - Backport #38530: mimic: multisite: memory growth from RGWCoroutinesStacks on lease errorsResolvedPrashant DActions
Actions #1

Updated by Casey Bodley about 5 years ago

  • Description updated (diff)
  • Status changed from In Progress to Fix Under Review
Actions #2

Updated by Casey Bodley about 5 years ago

with the fix applied, the polling gateway remains in a steady state of 0 spawned stacks:

~/ceph/build $ bin/ceph daemon run/c2/out/radosgw.8002.asok cr dump | jq .coroutine_managers[1].run_contexts[0].entries[1].ops[0]
{
  "type": "25RGWDataSyncShardControlCR" 
}

Actions #3

Updated by Casey Bodley about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38529: luminous: multisite: memory growth from RGWCoroutinesStacks on lease errors added
Actions #5

Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38530: mimic: multisite: memory growth from RGWCoroutinesStacks on lease errors added
Actions #6

Updated by Nathan Cutler almost 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF