Project

General

Profile

Actions

Bug #38931

closed

osd does not proactively remove leftover PGs

Added by Dan van der Ster about 5 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(Context: cephfs cluster running v12.2.11)

We had an osd go nearfull this weekend. I reweighted it to move out some PGs, but when looking today it's still holding much more data than it should.

The osd currently has 34 PGs mapped to it:

  74   hdd 5.45609  1.00000 5.46TiB 3.86TiB 1.60TiB 70.77 1.37  34

But the OSD itself reports 20 more:

{
    "whoami": 74,
    "state": "active",
    "oldest_map": 46992,
    "newest_map": 47738,
    "num_pgs": 54
}

When I restart the OSD, it reloads those 20, e.g. here is a PG it loads but which is mapped to [22,129,14]. That PG is currently active+clean.

2019-03-25 11:09:26.655955 7fe3fdb23d80 10 osd.74 47719 load_pgs loaded pg[2.6d( v 47685'27177090 (47637'27175587,47685'27177090] lb MIN (bitwise) local-lis/les=47680/47681 n=0 ec=371/371 lis/c 47683/47497 les/c/f 47684/47498/0 47686/47688/43553) [22,129,14] r=-1 lpr=47689 pi=[47497,47688)/1 crt=47685'27177090 lcod 0'0 unknown NOTIFY mbc={}] log((47637'27175587,47685'27177090], crt=47685'27177090)

I found a way to remove those leftover PGs (without using ceph-objectstore-tool): If the PG re-peers, then osd.74 notices he's not in the up/acting set then starts deleting the PG.
So at the moment I'm restarting those former peers to trim this OSD.

Is this all an expected behaviour?
Shouldn't the OSD start removing leftover PGs at boot time?


Related issues 3 (0 open3 closed)

Copied to RADOS - Backport #51582: octopus: osd does not proactively remove leftover PGsResolvedMykola GolubActions
Copied to RADOS - Backport #51583: nautilus: osd does not proactively remove leftover PGsResolvedMykola GolubActions
Copied to RADOS - Backport #51584: pacific: osd does not proactively remove leftover PGsResolvedMykola GolubActions
Actions

Also available in: Atom PDF