Bug #62845
openlarge omap objects in log pool 'XXX.rgw.log' : sync.error-log.X
0%
Description
We have a problem with sync error logs.
For the past week we have some problem with the replication between 2 zones but it's now fixed.
Ceph has gone to HEALTH_WARN because of LARGE_OMAP_OBJECTS. These objects are sync.error-log
[rook@rook-ceph-tools-566d8c6b5f-ldtvw /]$ ceph -s cluster: id: 4a7de165-eb94-4313-af36-c4c64f926874 health: HEALTH_OK (muted: LARGE_OMAP_OBJECTS(2h)) [rook@rook-ceph-tools-566d8c6b5f-ldtvw /]$ ceph health detail HEALTH_OK (muted: LARGE_OMAP_OBJECTS(2h)) (MUTED ttl 2h, STICKY) [WRN] LARGE_OMAP_OBJECTS: 31 large omap objects 31 large objects found in pool 'ch-gva-nvme-d3.rgw.log' Search the cluster log for 'Large omap object found' for more details.
in logs :
Time message Sep 14, 2023 @ 04:43:03.724 cluster 2023-09-14T02:43:01.907081+0000 osd.18 (osd.18) 3622 : cluster [WRN] Large omap object found. Object: 6:d6ec6838:::sync.error-log.26:head PG: 6.1c16376b (6.3) Key count: 250163 Size (bytes): 60838504 Sep 14, 2023 @ 04:43:03.724 cluster 2023-09-14T02:43:02.225287+0000 osd.18 (osd.18) 3623 : cluster [WRN] Large omap object found. Object: 6:d95f86ac:::sync.error-log.22:head PG: 6.3561fa9b (6.3) Key count: 250167 Size (bytes): 60845593
On a side note, we could only see the content of these logs with max-entries because start-date is not allowed :
radosgw-admin --rgw-realm=ik-s3-nvme sync error list --max-entries 10000
we want to clear the logs by using the dedicated command, but it has no effect :
radosgw-admin --rgw-realm=ik-s3-nvme sync error trim
even if we try to deep scrub the pg the logs file dont qet trim:
ceph pg deep-scrub 6.3
Is there something we miss ?
Updated by Casey Bodley 8 months ago
- Assignee set to Shilpa MJ
- Tags set to multisite
Updated by Matthew Darwin 8 months ago
Same problem here. `radosgw-admin sync error trim` seems to do nothing.
The suggestion from the ceph-users mailing list was to pass an end date, but that doesn't work either
```
$ radosgw-admin sync error trim --end-date="2023-08-20 23:00:00"
end-date not allowed.
```
I'm using ceph version 17.2.6 (quincy)
Updated by Shilpa MJ 8 months ago
Since the addition of https://github.com/ceph/ceph/commit/26f5b2f58efc9e489fe314b41d8ec8bc00a9f0c7, we take start-marker for the list commands and end-marker for trim commands. The cls function timelog.trim() expects a marker to be passed in, without which it returns an ENODATA which is treated as success. so the command does nothing. The fix here ensures that we accept an --end-marker and fail if we don't provide any. This changes the original behaviour in that we don't accept the trim command without the marker option.
Updated by Steven Goodliff 8 months ago
Hi,
sorry for hijacking this but is there a workaround to trim the sync error log in the meantime ?
thanks
Updated by Matthew Darwin 8 months ago
I was going to ask the same question. ;-)
Steven Goodliff wrote:
Hi,
sorry for hijacking this but is there a workaround to trim the sync error log in the meantime ?
thanks
Updated by Casey Bodley 7 months ago
hi Steven/Matt,
in the past i remember using --marker=9
for trim commands like this, as a way to trim all entries. can you give that a try and let us know if it helps?
Updated by Matthew Darwin 7 months ago
Casey Bodley wrote:
hi Steven/Matt,
in the past i remember using
--marker=9
for trim commands like this, as a way to trim all entries. can you give that a try and let us know if it helps?
Yes, this works for me. deep scrub has run and the large omap error is now gone.
Updated by Casey Bodley 7 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from quincy reef pacific to quincy reef
Updated by Backport Bot 7 months ago
- Copied to Backport #63403: reef: large omap objects in log pool 'XXX.rgw.log' : sync.error-log.X added
Updated by Backport Bot 7 months ago
- Copied to Backport #63404: quincy: large omap objects in log pool 'XXX.rgw.log' : sync.error-log.X added
Updated by Backport Bot 7 months ago
- Tags changed from multisite to multisite backport_processed