Actions
Bug #38479
closedmultisite: memory growth from RGWCoroutinesStacks on lease errors
% Done:
0%
Source:
Tags:
multisite
Backport:
luminous mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
When multiple gateways are running in the same zone of a multisite configuration, they use leases to coordinate with each other on sync. While one gateway holds a lease, other gateways continue to poll until it becomes available. Each failed poll attempt leaves behind a spawned coroutine stack, causing memory growth that doesn't get cleaned up until shutdown.
~/ceph/build $ bin/ceph daemon run/c2/out/radosgw.8002.asok cr dump | jq .coroutine_managers[0].run_contexts[0].entries[1].ops[0] { "type": "25RGWDataSyncShardControlCR", "spawned": [ "0x55ab5bdb7860", "0x55ab5c258780", "0x55ab5c3565a0", "0x55ab5c357680", "0x55ab5c357b30", "0x55ab5c364e10", "0x55ab5c3650e0", "0x55ab5c3652c0", "0x55ab5c3654a0", "0x55ab5c422e10", "0x55ab5c4231d0", "0x55ab5c4234a0", "0x55ab5c423770", "0x55ab5c4914a0", "0x55ab5c491770", "0x55ab5c491a40", "0x55ab5c491d10", "0x55ab5c5013b0", "0x55ab5c501680", "0x55ab5c501950", "0x55ab5c501c20", "0x55ab5c501ef0", "0x55ab5c57bd10", "0x55ab5b734960", "0x55ab5c5ec5a0", "0x55ab5c5ec690", "0x55ab5c5ec780", "0x55ab5c5ec870", "0x55ab5c5ec960", "0x55ab5c5eca50", "0x55ab5c5ecb40", "0x55ab5c5ecc30" ] }
Updated by Casey Bodley about 5 years ago
- Description updated (diff)
- Status changed from In Progress to Fix Under Review
Updated by Casey Bodley about 5 years ago
with the fix applied, the polling gateway remains in a steady state of 0 spawned stacks:
~/ceph/build $ bin/ceph daemon run/c2/out/radosgw.8002.asok cr dump | jq .coroutine_managers[1].run_contexts[0].entries[1].ops[0] { "type": "25RGWDataSyncShardControlCR" }
Updated by Casey Bodley about 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38529: luminous: multisite: memory growth from RGWCoroutinesStacks on lease errors added
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38530: mimic: multisite: memory growth from RGWCoroutinesStacks on lease errors added
Updated by Nathan Cutler almost 5 years ago
- Status changed from Pending Backport to Resolved
Actions