Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly - RADOS - Ceph

Actions

Copy link

Bug #48298

open

hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly

Added by Jonas Jelten over 3 years ago. Updated over 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.15

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I just added OSDs to my cluster running 14.2.13.

mon_max_pg_per_osd = 300
osd_max_pg_per_osd_hard_ratio = 3

OSDs of comparable size have maybe 200 PGs on them.

This OSD now somehow has 907 > 300*3 PGs:

ceph daemon osd.422 status
{
    "cluster_fsid": "xxx",
    "osd_fsid": "yyy",
    "whoami": 422,
    "state": "booting",
    "oldest_map": 454592,
    "newest_map": 455185,
    "num_pgs": 907
}

Thus PGs become stuck activating+remapped and the large parts of the cluster die.

The interesting thing is this: Now after I've increased the limit, it does of course boot and PGs become active.
Now, but the num_pgs have increased further to 969. But then they started to decrease, until the device has the expected number of PGs!

Another problem: There's absolutely no hint that the osd_max_pg_per_osd_hard_ratio has hit. You only get the warning when being over the soft limit.

tl;dr:

More PGs are allocated on an OSD than there actually are once the remapping is done.
There's no cluster error when a OSD does hit the hard limit.

Files

2020-11-20-133944_1906x1477_scrot.png (393 KB) 2020-11-20-133944_1906x1477_scrot.png

graph of decreasing num_pgs

Jonas Jelten, 11/20/2020 12:44 PM

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #48298

hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly

Updated by Jonas Jelten over 3 years ago

Updated by Jonas Jelten over 3 years ago

Updated by Jonas Jelten about 3 years ago

Updated by Sage Weil almost 3 years ago

Updated by Jonas Jelten over 2 years ago

Updated by Neha Ojha over 2 years ago