Project

General

Profile

Actions

Bug #53585

open

RGW Garbage collector leads to slow ops and osd down when removing large object

Added by pouria dehghani over 2 years ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
gc-queue
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After deleting one file(rgw object) more than 500GB size, we have some laggy pgs, slow ops and osd down on cluster when gc process runs.

We tried many different GC configuration but still have this problem, like below

We dont observe any abnormal usage on disks, memory, cpu on the cluster nodes during GC process.

Following graph represents waste usage on cluster, gc pool size, and number of osd downs.configs:
following configurations are default configurations and we change some of them like concurrent_io or queue_size with different values but the problem still is here :
rgw_gc_max_concurrent_io 10
rgw_gc_max_deferred 50
rgw_gc_max_deferred_entries_size 3072
rgw_gc_max_objs 32
rgw_gc_max_queue_size 134213632
rgw_gc_max_trim_chunk 128
rgw_gc_obj_min_wait 1800
rgw_gc_processor_max_time 3600
rgw_gc_processor_period 3600

our ceph version is 15.2.12
we have SSD tier and HDD tier with separate NVME for BluestoreDB usage

We tried to delete a 2TB file from RGW and it was not a problem because it didn't delete by GC process and i think it deleted by RGW daemon directly, also tried to delete many small files with total size of ~500GB and this also was not a problem.
the problem raised just for single files with ~500GB.
our cluster is upgraded from nautilus 14.2.12 to 15.2.12 and i think this happened after upgrade.


Related issues 2 (0 open2 closed)

Related to rgw - Bug #49823: rgw gc object leak when gc omap set entry failed with a large omap valueResolvedPritha Srivastava

Actions
Related to rgw - Bug #58190: Large RGW GC queue might prevent OSD from startingResolved

Actions
Actions #1

Updated by pouria dehghani over 2 years ago

pouria dehghani wrote:

After deleting one file(rgw object) more than 500GB size, we have some laggy pgs, slow ops and osd down on cluster when gc process runs.

We tried many different GC configuration but still have this problem, like below

We dont observe any abnormal usage on disks, memory, cpu on the cluster nodes during GC process.

following configurations are default configurations and we change some of them like concurrent_io or queue_size with different values but the problem still is here :
rgw_gc_max_concurrent_io 10
rgw_gc_max_deferred 50
rgw_gc_max_deferred_entries_size 3072
rgw_gc_max_objs 32
rgw_gc_max_queue_size 134213632
rgw_gc_max_trim_chunk 128
rgw_gc_obj_min_wait 1800
rgw_gc_processor_max_time 3600
rgw_gc_processor_period 3600

our ceph version is 15.2.12
we have SSD tier and HDD tier with separate NVME for BluestoreDB usage

We tried to delete a 2TB file from RGW and it was not a problem because it didn't delete by GC process and i think it deleted by RGW daemon directly, also tried to delete many small files with total size of ~500GB and this also was not a problem.
the problem raised just for single files with ~500GB.
our cluster is upgraded from nautilus 14.2.12 to 15.2.12 and i think this happened after upgrade.

Actions #2

Updated by Mykola Golub about 2 years ago

@pouria Don't you see errors like this in the osd logs before slow ops?

"do_op msg data len ... > osd_max_write_size ... [call ... rgw_gc.rgw_gc_queue_enqueue ...]"

We seem to see this on our cluster, and it causes that the enqueuing the object chain to the garbage collector fails (due to the chain exceeding osd_max_write_size for lage files) and the rgw falls back to the sync deletion, which may cause large osd load.

Just curious if it is the same case for you and if increasing osd_max_write_size helps you.

Actions #3

Updated by pouria dehghani about 2 years ago

Mykola Golub wrote:

@pouria Don't you see errors like this in the osd logs before slow ops?

"do_op msg data len ... > osd_max_write_size ... [call ... rgw_gc.rgw_gc_queue_enqueue ...]"

We seem to see this on our cluster, and it causes that the enqueuing the object chain to the garbage collector fails (due to the chain exceeding osd_max_write_size for lage files) and the rgw falls back to the sync deletion, which may cause large osd load.

Just curious if it is the same case for you and if increasing osd_max_write_size helps you.

Hi Mykola
Thanks for replying to this.
we have some of these logs we tried to change the osd_max_write_size(on staging not prod, we will test it more) but it doesn't work for us, I think we have another problem.
actually, when the slow ops come to the cluster we can not even run the radosgw-admin gc list.
maybe when gc wants to try to get a list of GC Objects and get stuck, I don't know if this is just a guess.
also we have these kind of logs as the following :

```
2022-04-02T00:45:26.838+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:26.838+0430 7f7c59dd9700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:26.938+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:26.938+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.030+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.030+0430 7f7c59dd9700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.126+0430 7f7c59dd9700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.130+0430 7f7c58dd7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.178+0430 7f7c58dd7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.178+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.434+0430 7f7c3ad9b700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:28.434+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520711.0:4475553 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:19.956024+0430 currently delayed
2022-04-02T00:45:28.434+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520690.0:4765267 11.4 11:24c8e1ae:gc::gc.19:head [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected e123422) initiated 2022-04-02T00:44:22.843822+0430 currently delayed
2022-04-02T00:45:28.434+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21521080.0:4403782 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.017926+0430 currently delayed
2022-04-02T00:45:28.434+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520657.0:5695161 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.034662+0430 currently delayed
2022-04-02T00:45:28.434+0430 7f7c545ce700 -1 osd.94 123422 get_health_metrics reporting 4 slow ops, oldest is osd_op(client.21520711.0:4475553 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422)
2022-04-02T00:45:29.406+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520711.0:4475553 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:19.956024+0430 currently delayed
2022-04-02T00:45:29.406+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520690.0:4765267 11.4 11:24c8e1ae:gc::gc.19:head [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected e123422) initiated 2022-04-02T00:44:22.843822+0430 currently delayed
2022-04-02T00:45:29.406+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21521080.0:4403782 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.017926+0430 currently delayed
2022-04-02T00:45:29.406+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520657.0:5695161 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.034662+0430 currently delayed
2022-04-02T00:45:29.406+0430 7f7c545ce700 -1 osd.94 123422 get_health_metrics reporting 4 slow ops, oldest is osd_op(client.21520711.0:4475553 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422)
...
...
...
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480713.0:4797959 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123421) initiated 2022-04-02T00:44:17.828400+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480353.0:4861812 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.190933+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21533366.0:5106099 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:35.962165+0430 currently delayed
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520609.0:4365148 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:35.964893+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480608.0:3926392 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:24.139764+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480176.0:4991062 11.4 11:2166d8e1:gc::gc.3:head [call lock.unlock in=34b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:36.094334+0430 currently waiting for sub ops
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480191.0:4662248 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:18.643510+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480131.0:17046146 11.4 11:25ca39bb:::obj_delete_at_hint.0000000005:head [call lock.lock in=50b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:42.480600+0430 currently waiting for sub ops
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480416.0:4363095 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:19.184144+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520690.0:4765278 11.4 11:24c8e1ae:gc::gc.19:head [call rgw_gc.rgw_gc_queue_remove_entries in=47b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:35.832983+0430 currently waiting for sub ops
2022-04-02T00:46:15.226+0430 7f7c545ce700 -1 osd.94 123422 get_health_metrics reporting 11 slow ops, oldest is osd_op(client.22480713.0:4797959 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123421)
2022-04-02T00:46:16.266+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:45:49.679919+0430 front 2022-04-02T00:45:49.679593+0430 (oldest deadline 2022-04-02T00:46:13.779641+0430)
2022-04-02T00:46:17.242+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:45:49.679919+0430 front 2022-04-02T00:45:49.679593+0430 (oldest deadline 2022-04-02T00:46:13.779641+0430)
2022-04-02T00:46:52.147+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:46:26.690129+0430 front 2022-04-02T00:46:26.689676+0430 (oldest deadline 2022-04-02T00:46:51.389628+0430)
2022-04-02T00:46:53.103+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:46:26.690129+0430 front 2022-04-02T00:46:26.689676+0430 (oldest deadline 2022-04-02T00:46:51.389628+0430)
2022-04-02T00:46:54.135+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:46:26.690129+0430 front 2022-04-02T00:46:26.689676+0430 (oldest deadline 2022-04-02T00:46:51.389628+0430)
2022-04-02T00:47:10.471+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:47:10.471+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:47:10.487+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:47:10.487+0430 7f7c58dd7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:47:10.563+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
```

the PG-ID that is mentioned in the logs is the id of GC pool.
I would appreciate any help. Thanks

Actions #4

Updated by pouria dehghani about 2 years ago

pouria dehghani wrote:

Mykola Golub wrote:

@pouria Don't you see errors like this in the osd logs before slow ops?

"do_op msg data len ... > osd_max_write_size ... [call ... rgw_gc.rgw_gc_queue_enqueue ...]"

We seem to see this on our cluster, and it causes that the enqueuing the object chain to the garbage collector fails (due to the chain exceeding osd_max_write_size for lage files) and the rgw falls back to the sync deletion, which may cause large osd load.

Just curious if it is the same case for you and if increasing osd_max_write_size helps you.

Hi Mykola
Thanks for replying to this.
we have some of these logs we tried to change the osd_max_write_size(on staging not prod, we will test it more) but it doesn't work for us, I think we have another problem.
actually, when the slow ops come to the cluster we can not even run the radosgw-admin gc list.
maybe when gc wants to try to get a list of GC Objects and get stuck, I don't know if this is just a guess.
also we have these kind of logs as the following :

```
2022-04-02T00:45:26.838+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:26.838+0430 7f7c59dd9700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:26.938+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:26.938+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.030+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.030+0430 7f7c59dd9700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.126+0430 7f7c59dd9700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.130+0430 7f7c58dd7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.178+0430 7f7c58dd7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.178+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:27.434+0430 7f7c3ad9b700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:45:28.434+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520711.0:4475553 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:19.956024+0430 currently delayed
2022-04-02T00:45:28.434+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520690.0:4765267 11.4 11:24c8e1ae:gc::gc.19:head [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected e123422) initiated 2022-04-02T00:44:22.843822+0430 currently delayed
2022-04-02T00:45:28.434+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21521080.0:4403782 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.017926+0430 currently delayed
2022-04-02T00:45:28.434+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520657.0:5695161 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.034662+0430 currently delayed
2022-04-02T00:45:28.434+0430 7f7c545ce700 -1 osd.94 123422 get_health_metrics reporting 4 slow ops, oldest is osd_op(client.21520711.0:4475553 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422)
2022-04-02T00:45:29.406+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520711.0:4475553 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:19.956024+0430 currently delayed
2022-04-02T00:45:29.406+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520690.0:4765267 11.4 11:24c8e1ae:gc::gc.19:head [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected e123422) initiated 2022-04-02T00:44:22.843822+0430 currently delayed
2022-04-02T00:45:29.406+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21521080.0:4403782 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.017926+0430 currently delayed
2022-04-02T00:45:29.406+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520657.0:5695161 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.034662+0430 currently delayed
2022-04-02T00:45:29.406+0430 7f7c545ce700 -1 osd.94 123422 get_health_metrics reporting 4 slow ops, oldest is osd_op(client.21520711.0:4475553 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422)
...
...
...
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480713.0:4797959 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123421) initiated 2022-04-02T00:44:17.828400+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480353.0:4861812 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:20.190933+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21533366.0:5106099 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:35.962165+0430 currently delayed
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520609.0:4365148 11.4 11:24c8e1ae:gc::gc.19:head [call lock.lock in=50b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:35.964893+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480608.0:3926392 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:24.139764+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480176.0:4991062 11.4 11:2166d8e1:gc::gc.3:head [call lock.unlock in=34b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:36.094334+0430 currently waiting for sub ops
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480191.0:4662248 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:18.643510+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480131.0:17046146 11.4 11:25ca39bb:::obj_delete_at_hint.0000000005:head [call lock.lock in=50b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:42.480600+0430 currently waiting for sub ops
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.22480416.0:4363095 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123422) initiated 2022-04-02T00:44:19.184144+0430 currently started
2022-04-02T00:46:15.226+0430 7f7c545ce700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.21520690.0:4765278 11.4 11:24c8e1ae:gc::gc.19:head [call rgw_gc.rgw_gc_queue_remove_entries in=47b] snapc 0=[] ondisk+write+known_if_redirected e123422) initiated 2022-04-02T00:45:35.832983+0430 currently waiting for sub ops
2022-04-02T00:46:15.226+0430 7f7c545ce700 -1 osd.94 123422 get_health_metrics reporting 11 slow ops, oldest is osd_op(client.22480713.0:4797959 11.4 11:2166d8e1:gc::gc.3:head [call lock.lock in=50b] snapc 0=[] RETRY=2 ondisk+retry+write+known_if_redirected e123421)
2022-04-02T00:46:16.266+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:45:49.679919+0430 front 2022-04-02T00:45:49.679593+0430 (oldest deadline 2022-04-02T00:46:13.779641+0430)
2022-04-02T00:46:17.242+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:45:49.679919+0430 front 2022-04-02T00:45:49.679593+0430 (oldest deadline 2022-04-02T00:46:13.779641+0430)
2022-04-02T00:46:52.147+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:46:26.690129+0430 front 2022-04-02T00:46:26.689676+0430 (oldest deadline 2022-04-02T00:46:51.389628+0430)
2022-04-02T00:46:53.103+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:46:26.690129+0430 front 2022-04-02T00:46:26.689676+0430 (oldest deadline 2022-04-02T00:46:51.389628+0430)
2022-04-02T00:46:54.135+0430 7f7c545ce700 -1 osd.94 123422 heartbeat_check: no reply from 172.21.0.6:6830 osd.45 since back 2022-04-02T00:46:26.690129+0430 front 2022-04-02T00:46:26.689676+0430 (oldest deadline 2022-04-02T00:46:51.389628+0430)
2022-04-02T00:47:10.471+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:47:10.471+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:47:10.487+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:47:10.487+0430 7f7c58dd7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
2022-04-02T00:47:10.563+0430 7f7c595d8700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c3ad9b700' had timed out after 15
```

the PG-ID that is mentioned in the logs is the id of GC pool.
I would appreciate any help. Thanks

I tried to dig more, run radosgw-admin gc list and radosgw-admin gc process with --debug-rgw=20 flga, these are my logs :
```
radosgw-admin gc list --debug-rgw=20
2022-04-03T00:25:13.573+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.577+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.577+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.577+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.577+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.577+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.577+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.581+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=46
2022-04-03T00:25:13.641+0430 7f569a6f5080 20 RGWRados::pool_iterate: got zone_info.b79501ab-89c7-45a6-a823-9ca02632b225
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 RGWRados::pool_iterate: got zonegroup_info.f52c32d7-175a-46b5-95d8-82816ebe95f7
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 RGWRados::pool_iterate: got zone_names.default
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 RGWRados::pool_iterate: got zonegroups_names.default
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=46
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=1029
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=46
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=423
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.645+0430 7f569a6f5080 10 cannot find current period zonegroup using local zonegroup
2022-04-03T00:25:13.645+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=46
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=423
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 zonegroup default
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 10 Cannot find current period zone using local zone
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=46
2022-04-03T00:25:13.649+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.653+0430 7f569a6f5080 20 rados_obj.operate() r=0 bl.length=1029
2022-04-03T00:25:13.653+0430 7f569a6f5080 20 zone default found
2022-04-03T00:25:13.653+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.653+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.653+0430 7f569a6f5080 20 rados->read ofs=0 len=0
2022-04-03T00:25:13.653+0430 7f569a6f5080 20 rados_obj.operate() r=-2 bl.length=0
2022-04-03T00:25:13.653+0430 7f569a6f5080 20 started sync module instance, tier type =
2022-04-03T00:25:13.653+0430 7f569a6f5080 20 started zone id=b79501ab-89c7-45a6-a823-9ca02632b225 (name=default) with tier type =
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 add_watcher() i=0
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 add_watcher() i=1
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 add_watcher() i=2
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 add_watcher() i=3
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 add_watcher() i=4
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 add_watcher() i=5
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 add_watcher() i=6
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 add_watcher() i=7
2022-04-03T00:25:13.761+0430 7f569a6f5080 2 all 8 watchers are set, enabling cache
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 check_secure_mon_conn(): auth registy supported: methods=[2,1] modes=[2,1]
2022-04-03T00:25:13.761+0430 7f569a6f5080 20 check_secure_mon_conn(): method 1 is insecure
2022-04-03T00:25:13.761+0430 7f565b7e6700 2 RGWDataChangesLog::ChangesRenewThread: start
2022-04-03T00:25:27.629+0430 7f55c9ffb700 20 reqs_thread_entry: start
2022-04-03T00:25:27.629+0430 7f569a6f5080 20 init_complete bucket index max shards: 11
2022-04-03T00:25:27.629+0430 7f55bb7fe700 20 reqs_thread_entry: start
2022-04-03T00:25:27.629+0430 7f55b9ffb700 20 reqs_thread_entry: start
2022-04-03T00:25:35.761+0430 7f565b7e6700 2 RGWDataChangesLog::ChangesRenewThread: start

2022-04-03T00:25:57.765+0430 7f565b7e6700 2 RGWDataChangesLog::ChangesRenewThread: start

2022-04-03T00:26:19.765+0430 7f565b7e6700 2 RGWDataChangesLog::ChangesRenewThread: start
```
and the noticed whenever run gc list or gc process at this line "RGWDataChangesLog::ChangesRenewThread: start" get stuck and we have laggy PGs

Actions #5

Updated by Mykola Golub about 2 years ago

Increasing osd_max_write_size in our case turned out to be a bad idea. Although it helped with queueing the objects to the garbage collector queue (rgw_gc.rgw_gc_queue_enqueue call did not fail anymore), the queue (queue objects named gc.{index} in the rgw log pool under gc namespace) were very large (up to 130M) and it looks like the garbage collector cls queue operations are not optimized for working so large enrites and blocked the osds. Eventually we had to manually remove the gc objects (and then manually clean the orphan objects) to make the rgw work again.

@pouria I suppose you are observing the similar effect.

Probably splitting rgw_gc.rgw_gc_queue_enqueue in smaller chunks could be a solution.

Actions #6

Updated by pouria dehghani about 2 years ago

Mykola Golub wrote:

Increasing osd_max_write_size in our case turned out to be a bad idea. Although it helped with queueing the objects to the garbage collector queue (rgw_gc.rgw_gc_queue_enqueue call did not fail anymore), the queue (queue objects named gc.{index} in the rgw log pool under gc namespace) were very large (up to 130M) and it looks like the garbage collector cls queue operations are not optimized for working so large enrites and blocked the osds. Eventually we had to manually remove the gc objects (and then manually clean the orphan objects) to make the rgw work again.

@pouria I suppose you are observing the similar effect.

Probably splitting rgw_gc.rgw_gc_queue_enqueue in smaller chunks could be a solution.

Hi Mykola
We exactly have the same situation and even The 130M object size in GC index is the same, as you said we did the same with removing GC objects and then manually cleaning the orphans,
We temporarily Moved the GC Pool to dedicated OSDs to not affect our production OSDs.
I don't know what else to do

Actions #7

Updated by Mykola Golub about 2 years ago

pouria dehghani wrote:

We temporarily Moved the GC Pool to dedicated OSDs to not affect our production OSDs.
I don't know what else to do

I am not sure if it might be a good workaround for you, but you may try decreasing osd_max_write_size on those dedicated OSDs. As a result the gc queuing "large" delete requests will fail and the delete request will fallback to a synchronous delete. It would mean that the client will experience long delete requests (instead of immediate return), also it it not clear about osd load due to sync delete (the gc was supposed to control this load by throttling objects removal), still it might work for your workload.

Actions #8

Updated by pouria dehghani about 2 years ago

Mykola Golub wrote:

pouria dehghani wrote:

We temporarily Moved the GC Pool to dedicated OSDs to not affect our production OSDs.
I don't know what else to do

I am not sure if it might be a good workaround for you, but you may try decreasing osd_max_write_size on those dedicated OSDs. As a result the gc queuing "large" delete requests will fail and the delete request will fallback to a synchronous delete. It would mean that the client will experience long delete requests (instead of immediate return), also it it not clear about osd load due to sync delete (the gc was supposed to control this load by throttling objects removal), still it might work for your workload.

Hi again, yes we tried to decrease osd_max_write_size on these dedicated OSDs and it wasn't a good idea.
with these dedicated OSDs The laggy PGs only happen on PGs that belong to GC so just dedicated OSDs going to have SLOW OPS due to removal, as temporary it's ok but it is costly because we can not use all disk slots for production.
as you said the GC suppose to control this load by throttling objects removal but it doesn't.

Actions #9

Updated by Soumya Koduri about 2 years ago

  • Tracker changed from Bug to Feature
  • Status changed from New to Fix Under Review
  • Affected Versions v15.2.10 added
Actions #10

Updated by Soumya Koduri about 2 years ago

  • Source changed from Community (user) to Community (dev)
Actions #11

Updated by Soumya Koduri about 2 years ago

  • Tracker changed from Feature to Bug
  • Status changed from Fix Under Review to New
  • Source changed from Community (dev) to Community (user)
  • Regression set to No
  • Severity set to 3 - minor
Actions #12

Updated by Soumya Koduri about 2 years ago

  • Severity deleted (3 - minor)
  • Affected Versions deleted (v15.2.10)
Actions #13

Updated by Marcin Gibula almost 2 years ago

I think I can confirm this bug is still present at Pacific 16.2.9
Moreover, affected OSD crashes with following stacktraces - all similar:

ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)
1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f1a7fcc2980]
2: madvise()
3: (TCMalloc_SystemRelease(void*, unsigned long)+0x8a) [0x7f1a807d880a]
4: (tcmalloc::PageHeap::DecommitSpan(tcmalloc::Span*)+0x20) [0x7f1a807d9e80]
5: (tcmalloc::PageHeap::MergeIntoFreeList(tcmalloc::Span*)+0x21b) [0x7f1a807da2bb]
6: (tcmalloc::PageHeap::Delete(tcmalloc::Span*)+0x23) [0x7f1a807da513]
7: (tcmalloc::CentralFreeList::ReleaseToSpans(void*)+0x11d) [0x7f1a807d929d]
8: (tcmalloc::CentralFreeList::ReleaseListToSpans(void*)+0x1b) [0x7f1a807d935b]
9: (tcmalloc::CentralFreeList::InsertRange(void*, void*, int)+0x3f) [0x7f1a807d95ff]
10: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x110) [0x7f1a807dcc00]
11: (tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long)+0x1b) [0x7f1a807dcc9b]
12: cfree()
13: /usr/lib/rados-classes/libcls_rgw_gc.so(+0x15bdc) [0x7f1a6dda4bdc]
14: /usr/lib/rados-classes/libcls_rgw_gc.so(+0x130ee) [0x7f1a6dda20ee]
15: /usr/bin/ceph-osd(+0xb5ccde) [0x55d082046cde]
16: (ClassHandler::ClassMethod::exec(void*, ceph::buffer::v15_2_0::list&, ceph::buffer::v15_2_0::list&)+0x5a) [0x55d08204797a]
17: (PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x1221) [0x55d082104dc1]
18: (PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x9f) [0x55d082118d4f]
19: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x329) [0x55d082119559]
20: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x409d) [0x55d08211ee2d]
21: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xcc7) [0x55d08212a347]
22: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x17b) [0x55d081faed9b]
23: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6a) [0x55d082211b9a]
24: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xd1e) [0x55d081fccdbe]
25: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x55d08265275c]
26: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55d082655c20]
27: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f1a7fcb76db]
28: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)
1: /lib/x86_64-linux-gnu/libpthread.so.0(0x12980) [0x7f1cbc65d980]
2: (ceph::buffer::v15_2_0::list::iterator_impl<true>::operator
=(unsigned int)+0x34) [0x5570e6360644]
3: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, ceph::buffer::v15_2_0::list&)+0x5a) [0x5570e6360fba]
4: /usr/lib/rados-classes/libcls_rgw_gc.so(+0x1e162) [0x7f1ca8128162]
5: /usr/lib/rados-classes/libcls_rgw_gc.so(+0x11ff0) [0x7f1ca811bff0]
6: /usr/bin/ceph-osd(+0xb5ccde) [0x5570e5b93cde]
7: (ClassHandler::ClassMethod::exec(void*, ceph::buffer::v15_2_0::list&, ceph::buffer::v15_2_0::list&)+0x5a) [0x5570e5b9497a]
8: (PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x1221) [0x5570e5c51dc1]
9: (PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x9f) [0x5570e5c65d4f]
10: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x329) [0x5570e5c66559]
11: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x409d) [0x5570e5c6be2d]
12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xcc7) [0x5570e5c77347]
13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x17b) [0x5570e5afbd9b]
14: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6a) [0x5570e5d5eb9a]
15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xd1e) [0x5570e5b19dbe]
16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x5570e619f75c]
17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5570e61a2c20]
18: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f1cbc6526db]
ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)
1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7fdedf1a2980]
2: cfree()
3: /usr/lib/rados-classes/libcls_rgw_gc.so(+0x1d907) [0x7fdecc28a907]
4: /usr/lib/rados-classes/libcls_rgw_gc.so(+0x11ff0) [0x7fdecc27eff0]
5: /usr/bin/ceph-osd(+0xb5ccde) [0x55b472ea4cde]
6: (ClassHandler::ClassMethod::exec(void*, ceph::buffer::v15_2_0::list&, ceph::buffer::v15_2_0::list&)+0x5a) [0x55b472ea597a]
7: (PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x1221) [0x55b472f62dc1]
8: (PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x9f) [0x55b472f76d4f]
9: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x329) [0x55b472f77559]
10: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x409d) [0x55b472f7ce2d]
11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xcc7) [0x55b472f88347]
12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x17b) [0x55b472e0cd9b]
13: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6a) [0x55b47306fb9a]
14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xd1e) [0x55b472e2adbe]
15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x55b4734b075c]
16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b4734b3c20]
17: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fdedf1976db]
18: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #14

Updated by Casey Bodley over 1 year ago

  • Assignee set to Pritha Srivastava
  • Tags set to gc-queue
Actions #15

Updated by Casey Bodley over 1 year ago

  • Related to Bug #49823: rgw gc object leak when gc omap set entry failed with a large omap value added
Actions #16

Updated by Boris B over 1 year ago

Hi,
it looks like that laggy/flapping GC OSDs lead to the following errors:

$ zgrep -e "s3:.*WARNING" /var/log/ceph/ceph-client.*log*
..
... 1154217076211ns s3:complete_multipart WARNING: failed to remove object BUCKET:_multipart_FILENAME.2~y7i1gcwTy6_rCWfi_35pkU5i5MIVV-9.meta
... 1154221076159ns s3:complete_multipart WARNING: failed to unlock BUCKETID__multipart_FILENAME.2~y7i1gcwTy6_rCWfi_35pkU5i5MIVV-9.meta
..

And these files are in the index, but not available.

First we had all the rgw data pools on HDD OSDs and all the other pools on SSD OSDs. We had some entries like these in the logs and tried to move the GC pool to separate SSD OSDs to hope not to interfere with the index pool. We only added 4 SSDs (on 4 hosts), because there is no more dedicated SSDs left and this made it A LOT worse.
So we are now back to the old way.

Is there anything we can do (adjust timeouts, set config values, whatever) we can do that might not be good, but would mitigate this problem until the bug is fixed?

I have some really angry people behind my chair, and they might start lighting torches and getting pitchforks :/

Actions #17

Updated by Igor Fedotov over 1 year ago

  • Related to Bug #58190: Large RGW GC queue might prevent OSD from starting added
Actions #18

Updated by Boris B about 1 year ago

I think this PR might fix it: https://github.com/ceph/ceph/pull/50893

Mark Nelson had this in his talk at cephalocon and because we've move the GC to own OSDs (small SSDs) we just let these OSD compact all the time:

while true; do
  for OSD in LIST_OF_GC_OSDS; do
    ceph tell osd.${OSD} compact
  done
done

and currently it looks like it works. We are currently running a lot of huge deletes and the OSDs seem to be stable. Sometimes the a GC OSD becomes laggy for a short time, but the stopped flapping for now.

Actions #19

Updated by Boris B about 1 year ago

Nevermind. It seemed to help a bit, but we still have the problem.
It also seemed to raise the load on the system, so we stopped it again.

Actions

Also available in: Atom PDF