Project

General

Profile

Actions

Bug #56707

closed

pglog growing unbounded on EC with copy by ref

Added by Alexandre Marangone almost 2 years ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

100%

Source:
Community (dev)
Tags:
backport_processed
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

How to reproduce

- create a 10GB object in bucket1 using multipart upload
- copy object 200x via s3:ObjectCopy in parallel

Tested on Pacific 16.2.9

Observation

- rep 3x: PG log byte increase. 0.03GiB over 123 OSDs active data OSDs
- ec 4:2: PG log byte increase. 8GiB over 96 OSDs active data OSDs

Looking at the pglog itself, the log was up to 10KB and dumping the refcount xattr we see the following:

# ceph-dencoder type obj_refcount import /etc/ceph/out decode dump_json
{
    "refs": [
        {
            "oid": "<redacted>.5026683301901118227",
            "active": true
        }
    ],
    "retired_refs": [
        // list of 200 refs
    ]
}

Even after deleting + gc'ing the copied objects, the original still has the full list in `retired_refs` which weights a lot and will only increase the pglog size over time with EC due to the look-aside object.

Symptoms
This can cause the nodes to run ouf of memory but also will cause the memtables to get full very quickly on the data OSDs causing IO stalls and a lot of compaction. For some OSDs we even reached L5 which is a long term performance issue.


Files

pglog.tar.bz2 (28.4 KB) pglog.tar.bz2 Alexandre Marangone, 07/25/2022 11:19 PM
massif.out.bz2 (43.7 KB) massif.out.bz2 Alexandre Marangone, 07/26/2022 03:06 PM
mempool_before_and_after.txt (6.48 KB) mempool_before_and_after.txt Alexandre Marangone, 07/26/2022 03:07 PM
ceph-osd.32.log.bz2 (282 KB) ceph-osd.32.log.bz2 Alexandre Marangone, 08/15/2022 03:27 PM
1683775204857.png (39.6 KB) 1683775204857.png osd_pglog mempool 王子敬 wang, 05/11/2023 03:20 AM
1683775808540.jpg (84.4 KB) 1683775808540.jpg top 王子敬 wang, 05/11/2023 03:30 AM

Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #58613: pacific: pglog growing unbounded on EC with copy by refResolvedNitzan MordechaiActions
Copied to RADOS - Backport #58614: quincy: pglog growing unbounded on EC with copy by refResolvedNitzan MordechaiActions
Actions

Also available in: Atom PDF