Bug #62845: large omap objects in log pool 'XXX.rgw.log' : sync.error-log.X - rgw - Ceph

Actions

Copy link

Bug #62845

open

large omap objects in log pool 'XXX.rgw.log' : sync.error-log.X

Added by Thomas Mertz 8 months ago. Updated 7 months ago.

Status:

Pending Backport

Priority:

Normal

Assignee:

Shilpa MJ

Target version:

% Done:

Source:

Tags:

multisite backport_processed

Backport:

quincy reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v17.2.3

ceph-qa-suite:

Pull request ID:

53828

Crash signature (v1):

Crash signature (v2):

Description

We have a problem with sync error logs.
For the past week we have some problem with the replication between 2 zones but it's now fixed.

Ceph has gone to HEALTH_WARN because of LARGE_OMAP_OBJECTS. These objects are sync.error-log

[rook@rook-ceph-tools-566d8c6b5f-ldtvw /]$ ceph -s
  cluster:
    id:     4a7de165-eb94-4313-af36-c4c64f926874
    health: HEALTH_OK
            (muted: LARGE_OMAP_OBJECTS(2h))

[rook@rook-ceph-tools-566d8c6b5f-ldtvw /]$ ceph health detail 
HEALTH_OK (muted: LARGE_OMAP_OBJECTS(2h))
(MUTED ttl 2h, STICKY) [WRN] LARGE_OMAP_OBJECTS: 31 large omap objects
    31 large objects found in pool 'ch-gva-nvme-d3.rgw.log'
    Search the cluster log for 'Large omap object found' for more details.

in logs :

    Time
message
Sep 14, 2023 @ 04:43:03.724
cluster 2023-09-14T02:43:01.907081+0000 osd.18 (osd.18) 3622 : cluster [WRN] Large omap object found. Object: 6:d6ec6838:::sync.error-log.26:head PG: 6.1c16376b (6.3) Key count: 250163 Size (bytes): 60838504
Sep 14, 2023 @ 04:43:03.724
cluster 2023-09-14T02:43:02.225287+0000 osd.18 (osd.18) 3623 : cluster [WRN] Large omap object found. Object: 6:d95f86ac:::sync.error-log.22:head PG: 6.3561fa9b (6.3) Key count: 250167 Size (bytes): 60845593

On a side note, we could only see the content of these logs with max-entries because start-date is not allowed :

radosgw-admin --rgw-realm=ik-s3-nvme sync error list --max-entries 10000

we want to clear the logs by using the dedicated command, but it has no effect :

radosgw-admin --rgw-realm=ik-s3-nvme sync error trim

even if we try to deep scrub the pg the logs file dont qet trim:

ceph pg deep-scrub 6.3

Is there something we miss ?

Related issues 2 (2 open — 0 closed)

Actions

Copy link

Updated by Casey Bodley 8 months ago

Assignee set to Shilpa MJ
Tags set to multisite

Actions

Copy link

Updated by Matthew Darwin 8 months ago

Same problem here. `radosgw-admin sync error trim` seems to do nothing.

The suggestion from the ceph-users mailing list was to pass an end date, but that doesn't work either

```
$ radosgw-admin sync error trim --end-date="2023-08-20 23:00:00"
end-date not allowed.
```

I'm using ceph version 17.2.6 (quincy)

Actions

Copy link

Updated by Shilpa MJ 8 months ago

Status changed from New to In Progress
Pull request ID set to 53828

Actions

Copy link

Updated by Shilpa MJ 8 months ago

Since the addition of https://github.com/ceph/ceph/commit/26f5b2f58efc9e489fe314b41d8ec8bc00a9f0c7, we take start-marker for the list commands and end-marker for trim commands. The cls function timelog.trim() expects a marker to be passed in, without which it returns an ENODATA which is treated as success. so the command does nothing. The fix here ensures that we accept an --end-marker and fail if we don't provide any. This changes the original behaviour in that we don't accept the trim command without the marker option.

Actions

Copy link

Updated by Steven Goodliff 8 months ago

Hi,

sorry for hijacking this but is there a workaround to trim the sync error log in the meantime ?

thanks

Actions

Copy link

Updated by Matthew Darwin 8 months ago

I was going to ask the same question. ;-)

Steven Goodliff wrote:

Hi,

sorry for hijacking this but is there a workaround to trim the sync error log in the meantime ?

thanks

Actions

Copy link

Updated by Casey Bodley 7 months ago

hi Steven/Matt,

in the past i remember using --marker=9 for trim commands like this, as a way to trim all entries. can you give that a try and let us know if it helps?

Actions

Copy link

Updated by Steven Goodliff 7 months ago

worked for me, thanks very much.

Actions

Copy link

Updated by Shilpa MJ 7 months ago

Status changed from In Progress to Fix Under Review
Backport set to quincy reef pacific

Actions

Copy link

#10

Updated by Matthew Darwin 7 months ago

Casey Bodley wrote:

hi Steven/Matt,

in the past i remember using --marker=9 for trim commands like this, as a way to trim all entries. can you give that a try and let us know if it helps?

Yes, this works for me. deep scrub has run and the large omap error is now gone.

Actions

Copy link

#11