Project

General

Profile

Actions

Bug #62845

open

large omap objects in log pool 'XXX.rgw.log' : sync.error-log.X

Added by Thomas Mertz 8 months ago. Updated 7 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
multisite backport_processed
Backport:
quincy reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a problem with sync error logs.
For the past week we have some problem with the replication between 2 zones but it's now fixed.

Ceph has gone to HEALTH_WARN because of LARGE_OMAP_OBJECTS. These objects are sync.error-log

[rook@rook-ceph-tools-566d8c6b5f-ldtvw /]$ ceph -s
  cluster:
    id:     4a7de165-eb94-4313-af36-c4c64f926874
    health: HEALTH_OK
            (muted: LARGE_OMAP_OBJECTS(2h))

[rook@rook-ceph-tools-566d8c6b5f-ldtvw /]$ ceph health detail 
HEALTH_OK (muted: LARGE_OMAP_OBJECTS(2h))
(MUTED ttl 2h, STICKY) [WRN] LARGE_OMAP_OBJECTS: 31 large omap objects
    31 large objects found in pool 'ch-gva-nvme-d3.rgw.log'
    Search the cluster log for 'Large omap object found' for more details.

in logs :

    Time
message
Sep 14, 2023 @ 04:43:03.724
cluster 2023-09-14T02:43:01.907081+0000 osd.18 (osd.18) 3622 : cluster [WRN] Large omap object found. Object: 6:d6ec6838:::sync.error-log.26:head PG: 6.1c16376b (6.3) Key count: 250163 Size (bytes): 60838504
Sep 14, 2023 @ 04:43:03.724
cluster 2023-09-14T02:43:02.225287+0000 osd.18 (osd.18) 3623 : cluster [WRN] Large omap object found. Object: 6:d95f86ac:::sync.error-log.22:head PG: 6.3561fa9b (6.3) Key count: 250167 Size (bytes): 60845593

On a side note, we could only see the content of these logs with max-entries because start-date is not allowed :

radosgw-admin --rgw-realm=ik-s3-nvme sync error list --max-entries 10000

we want to clear the logs by using the dedicated command, but it has no effect :

radosgw-admin --rgw-realm=ik-s3-nvme sync error trim

even if we try to deep scrub the pg the logs file dont qet trim:

ceph pg deep-scrub 6.3

Is there something we miss ?


Related issues 2 (2 open0 closed)

Copied to rgw - Backport #63403: reef: large omap objects in log pool 'XXX.rgw.log' : sync.error-log.XNewShilpa MJActions
Copied to rgw - Backport #63404: quincy: large omap objects in log pool 'XXX.rgw.log' : sync.error-log.XNewShilpa MJActions
Actions #1

Updated by Casey Bodley 8 months ago

  • Assignee set to Shilpa MJ
  • Tags set to multisite
Actions #2

Updated by Matthew Darwin 8 months ago

Same problem here. `radosgw-admin sync error trim` seems to do nothing.

The suggestion from the ceph-users mailing list was to pass an end date, but that doesn't work either

```
$ radosgw-admin sync error trim --end-date="2023-08-20 23:00:00"
end-date not allowed.
```

I'm using ceph version 17.2.6 (quincy)

Actions #3

Updated by Shilpa MJ 8 months ago

  • Status changed from New to In Progress
  • Pull request ID set to 53828
Actions #4

Updated by Shilpa MJ 8 months ago

Since the addition of https://github.com/ceph/ceph/commit/26f5b2f58efc9e489fe314b41d8ec8bc00a9f0c7, we take start-marker for the list commands and end-marker for trim commands. The cls function timelog.trim() expects a marker to be passed in, without which it returns an ENODATA which is treated as success. so the command does nothing. The fix here ensures that we accept an --end-marker and fail if we don't provide any. This changes the original behaviour in that we don't accept the trim command without the marker option.

Actions #5

Updated by Steven Goodliff 8 months ago

Hi,

sorry for hijacking this but is there a workaround to trim the sync error log in the meantime ?

thanks

Actions #6

Updated by Matthew Darwin 8 months ago

I was going to ask the same question. ;-)

Steven Goodliff wrote:

Hi,

sorry for hijacking this but is there a workaround to trim the sync error log in the meantime ?

thanks

Actions #7

Updated by Casey Bodley 7 months ago

hi Steven/Matt,

in the past i remember using --marker=9 for trim commands like this, as a way to trim all entries. can you give that a try and let us know if it helps?

Actions #8

Updated by Steven Goodliff 7 months ago

worked for me, thanks very much.

Actions #9

Updated by Shilpa MJ 7 months ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to quincy reef pacific
Actions #10

Updated by Matthew Darwin 7 months ago

Casey Bodley wrote:

hi Steven/Matt,

in the past i remember using --marker=9 for trim commands like this, as a way to trim all entries. can you give that a try and let us know if it helps?

Yes, this works for me. deep scrub has run and the large omap error is now gone.

Actions #11

Updated by Casey Bodley 7 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from quincy reef pacific to quincy reef
Actions #12

Updated by Backport Bot 7 months ago

  • Copied to Backport #63403: reef: large omap objects in log pool 'XXX.rgw.log' : sync.error-log.X added
Actions #13

Updated by Backport Bot 7 months ago

  • Copied to Backport #63404: quincy: large omap objects in log pool 'XXX.rgw.log' : sync.error-log.X added
Actions #14

Updated by Backport Bot 7 months ago

  • Tags changed from multisite to multisite backport_processed
Actions

Also available in: Atom PDF