Project

General

Profile

Bug #38479

Updated by Casey Bodley 6 months ago

When multiple gateways are running in the same zone of a multisite configuration, they use leases to coordinate with each other on sync. While one gateway holds a lease, other gateways continue to poll until it becomes available. Each failed poll attempt leaves behind a spawned coroutine stack, causing memory growth that doesn't get cleaned up until shutdown.

<pre>
~/ceph/build $ bin/ceph daemon run/c2/out/radosgw.8002.asok cr dump | jq .coroutine_managers[0].run_contexts[0].entries[1].ops[0]
{
"type": "25RGWDataSyncShardControlCR",
"spawned": [
"0x55ab5bdb7860",
"0x55ab5c258780",
"0x55ab5c3565a0",
"0x55ab5c357680",
"0x55ab5c357b30",
"0x55ab5c364e10",
"0x55ab5c3650e0",
"0x55ab5c3652c0",
"0x55ab5c3654a0",
"0x55ab5c422e10",
"0x55ab5c4231d0",
"0x55ab5c4234a0",
"0x55ab5c423770",
"0x55ab5c4914a0",
"0x55ab5c491770",
"0x55ab5c491a40",
"0x55ab5c491d10",
"0x55ab5c5013b0",
"0x55ab5c501680",
"0x55ab5c501950",
"0x55ab5c501c20",
"0x55ab5c501ef0",
"0x55ab5c57bd10",
"0x55ab5b734960",
"0x55ab5c5ec5a0",
"0x55ab5c5ec690",
"0x55ab5c5ec780",
"0x55ab5c5ec870",
"0x55ab5c5ec960",
"0x55ab5c5eca50",
"0x55ab5c5ecb40",
"0x55ab5c5ecc30"
]
}
</pre>

https://github.com/ceph/ceph/pull/26639

Back