Bug #24373: osd: eternal stuck PG in 'unfound_recovery' - RADOS - Ceph

Actions

Copy link

Bug #24373

closed

osd: eternal stuck PG in 'unfound_recovery'

Added by Kouya Shimura almost 6 years ago. Updated over 5 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Backfill/Recovery

Target version:

% Done:

Source:

Tags:

Backport:

mimic,luminous

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

A PG might be eternally stuck in 'unfound_recovery' after some OSDs are marked down.

For example, the following steps reproduce this.

1) Create EC 2+1 pool. Assume a PG has [1,0,2] up/acting set.
2) Execute "ceph osd out osd.0 osd.2". Now the PG has [1,3,5] up/acting set.
3) Put some objects to the PG.
4) Execute "ceph osd in osd.0 osd.2". It starts recovering to [1,0,2].
5) Execute "ceph osd down osd.3 osd.5". These downs are momentary. osd.3
and osd.5 boot instantly.
It leads the PG to transit 'unfound_recovery' and stay on forever
despite all OSDs are up.

This bad situation is resolved by means of marking down an OSD in acting set.

6) Execute "ceph osd down osd.0", then unfound objects are resolved
and the PG restarts recovering.

Upon my investigation, if downed OSD is not a member of current up/acting set,
a PG might stay 'ReplicaActive' and discard peering requests from the primary.
Thus the primary OSD can't exit from unfound state.
PGs of downed OSD should transit to 'Reset' state and start peering.

Files

ceph-osd.3.log.gz (48.8 KB) ceph-osd.3.log.gz

Kouya Shimura, 06/06/2018 02:17 AM

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #24373

osd: eternal stuck PG in 'unfound_recovery'

Updated by Kouya Shimura almost 6 years ago

Updated by Mykola Golub almost 6 years ago

Updated by Kouya Shimura almost 6 years ago

Updated by Sage Weil almost 6 years ago

Updated by Sage Weil almost 6 years ago

Updated by Mykola Golub almost 6 years ago

Updated by Kefu Chai almost 6 years ago

Updated by Nathan Cutler almost 6 years ago

Updated by Nathan Cutler almost 6 years ago

Updated by Nathan Cutler over 5 years ago