Bug #56707: pglog growing unbounded on EC with copy by ref - RADOS - Ceph

Actions

Copy link

Bug #56707

closed

pglog growing unbounded on EC with copy by ref

Added by Alexandre Marangone almost 2 years ago. Updated 4 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Nitzan Mordechai

Category:

Target version:

% Done:

100%

Source:

Community (dev)

Tags:

backport_processed

Backport:

pacific,quincy

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

47332

Crash signature (v1):

Crash signature (v2):

Description

How to reproduce

- create a 10GB object in bucket1 using multipart upload
- copy object 200x via s3:ObjectCopy in parallel

Tested on Pacific 16.2.9

Observation

- rep 3x: PG log byte increase. 0.03GiB over 123 OSDs active data OSDs
- ec 4:2: PG log byte increase. 8GiB over 96 OSDs active data OSDs

Looking at the pglog itself, the log was up to 10KB and dumping the refcount xattr we see the following:

# ceph-dencoder type obj_refcount import /etc/ceph/out decode dump_json
{
    "refs": [
        {
            "oid": "<redacted>.5026683301901118227",
            "active": true
        }
    ],
    "retired_refs": [
        // list of 200 refs
    ]
}

Even after deleting + gc'ing the copied objects, the original still has the full list in `retired_refs` which weights a lot and will only increase the pglog size over time with EC due to the look-aside object.

Symptoms
This can cause the nodes to run ouf of memory but also will cause the memtables to get full very quickly on the data OSDs causing IO stalls and a lot of compaction. For some OSDs we even reached L5 which is a long term performance issue.

Files

Download all files

pglog.tar.bz2 (28.4 KB) pglog.tar.bz2		Alexandre Marangone, 07/25/2022 11:19 PM
massif.out.bz2 (43.7 KB) massif.out.bz2		Alexandre Marangone, 07/26/2022 03:06 PM
mempool_before_and_after.txt (6.48 KB) mempool_before_and_after.txt		Alexandre Marangone, 07/26/2022 03:07 PM
ceph-osd.32.log.bz2 (282 KB) ceph-osd.32.log.bz2		Alexandre Marangone, 08/15/2022 03:27 PM
1683775204857.png (39.6 KB) 1683775204857.png	osd_pglog mempool	王子敬 wang, 05/11/2023 03:20 AM
1683775808540.jpg (84.4 KB) 1683775808540.jpg	top	王子敬 wang, 05/11/2023 03:30 AM

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #56707

pglog growing unbounded on EC with copy by ref

Updated by Neha Ojha almost 2 years ago

Updated by Alexandre Marangone almost 2 years ago

Updated by Nitzan Mordechai almost 2 years ago

Updated by Alexandre Marangone almost 2 years ago

Updated by Alexandre Marangone almost 2 years ago

Updated by Nitzan Mordechai almost 2 years ago

Updated by Alexandre Marangone almost 2 years ago

Updated by Nitzan Mordechai almost 2 years ago

Updated by Nitzan Mordechai almost 2 years ago

Updated by Nitzan Mordechai almost 2 years ago

Updated by Alexandre Marangone almost 2 years ago

Updated by Alexandre Marangone almost 2 years ago

Updated by Nitzan Mordechai almost 2 years ago

Updated by Alexandre Marangone almost 2 years ago

Updated by Alexandre Marangone almost 2 years ago

Updated by Nitzan Mordechai over 1 year ago

Updated by Yuri Weinstein over 1 year ago

Updated by Radoslaw Zarzynski over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by Backport Bot over 1 year ago

Updated by 王子敬 wang about 1 year ago

Updated by 王子敬 wang about 1 year ago

Updated by Nitzan Mordechai about 1 year ago

Updated by Alexandre Marangone 11 months ago

Updated by Konstantin Shalygin 4 months ago