Bug #41190
closedosd: pg stuck in waitactingchange when new acting set doesn't change
0%
Description
In pg GetLog state,when process choose_acting, if want not equal acting, it will request pg_temp from mon,and then post an event NeedActingChange.
before transit to WaitActingchange, pg state will exit from getlog->peering->primary,and then enter Started/Primary/Peering/WaitActingChange.
but in exit Primary state,pg->want_acting will be cleared,So when in WaitActingChange wait AdvMap evt,the empty want_acting will lead pg cycle in this state.
thus, pg state will be out of this state until new acting is not equal now and restart peering.
Problem Description:
epoch 90: want_acting is[9, 8, 3], pg acting is[9 8],it would request a pg_temp [9,8,3] from mon. and it would post an NeedActingChange event.
epoch 91: osd.3 is down in osdmap;now the new acting will be [9,8].but when in state WaitActingChange resolve ActMap, in the log,pg->want_acting is empty,
it will stuck in this state until pg's acting set different from now.
Files
Updated by qiuzhang chen over 4 years ago
pull request:OSD/PG: Fix pg stuck in waitactingchange #29580
Updated by qiuzhang chen over 4 years ago
PR: https://tracker.ceph.com/issues/40117 may have the same problem
Updated by qiuzhang chen over 4 years ago
Updated by qiuzhang chen over 4 years ago
- File problem_log.tar.gz problem_log.tar.gz added
the original log was too large,i filter the problem pg,please check it is enough to annalyze the problem.
Updated by Patrick Donnelly over 4 years ago
- Tracker changed from Fix to Bug
- Project changed from Ceph to RADOS
- Subject changed from pg stuck in waitactingchange when new acting set doesn't change to osd: pg stuck in waitactingchange when new acting set doesn't change
- Category deleted (
OSD) - Status changed from New to Fix Under Review
- Target version changed from v12.2.13 to v15.0.0
- Start date deleted (
08/09/2019) - Backport set to nautilus,mimic,luminous
- Regression set to No
- Severity set to 3 - minor
- Pull request ID set to 29669
- Affected Versions deleted (
v12.2.1) - ceph-qa-suite deleted (
rgw) - Component(RADOS) OSD added
Updated by Neha Ojha over 4 years ago
- Related to Bug #40117: PG stuck in WaitActingChange added
Updated by Neha Ojha almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45890: nautilus: osd: pg stuck in waitactingchange when new acting set doesn't change added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45891: luminous: osd: pg stuck in waitactingchange when new acting set doesn't change added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45892: mimic: osd: pg stuck in waitactingchange when new acting set doesn't change added
Updated by Nathan Cutler about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".