Bug #43311: asynchronous recovery + backfill might spin pg undersized for a long time - RADOS - Ceph

Actions

Copy link

Bug #43311

closed

asynchronous recovery + backfill might spin pg undersized for a long time

Added by xie xingguo over 4 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

xie xingguo

Category:

Backfill/Recovery

Target version:

Ceph - v15.0.0

% Done:

Source:

Community (dev)

Tags:

Backport:

mimic,nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

32202

Crash signature (v1):

Crash signature (v2):

Description

When an osd that is part of current up set gets chosen as an
async_recovery_target, it gets removed from the acting set.
Since we don't allow any want that is larger than the pool size,
a pg must transit into UNDERSIZED when asynchronous recovery
eventually happens.
However, if that pg has one or more backfill targets, it might
spin UNDERSIZED for a long time during which mon will keep issuing
"PG_AVAILABILITY" warns until all backfill targets finally completes.

Below is an example:

 [root@host-192-168-9-13 ~]# ceph pg 1.1fc query
        {
            "state": "active+undersized+remapped+backfilling",
            "snap_trimq": "[]",
            "snap_trimq_len": 0,
            "epoch": 16777,
            "up": [
                29,
                15,
                0
            ],
            "acting": [
                29,
                54
            ],
            "backfill_targets": [
                "15" 
            ],
            "async_recovery_targets": [
                "0" 
            ],
            "acting_recovery_backfill": [
                "0",
                "15",
                "29",
                "54" 
            ],

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by xie xingguo over 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Neha Ojha over 4 years ago

Status changed from New to Pending Backport
Backport changed from luminous,mimic,nautilus to mimic,nautilus

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #43469: nautilus: asynchronous recovery + backfill might spin pg undersized for a long time added

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #43470: mimic: asynchronous recovery + backfill might spin pg undersized for a long time added

Actions

Copy link

Updated by Nathan Cutler about 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #43311

asynchronous recovery + backfill might spin pg undersized for a long time

Updated by xie xingguo over 4 years ago

Updated by Neha Ojha over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler about 3 years ago