Project

General

Profile

Actions

Bug #41190

closed

osd: pg stuck in waitactingchange when new acting set doesn't change

Added by qiuzhang chen over 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus,mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In pg GetLog state,when process choose_acting, if want not equal acting, it will request pg_temp from mon,and then post an event NeedActingChange.

before transit to WaitActingchange, pg state will exit from getlog->peering->primary,and then enter Started/Primary/Peering/WaitActingChange.

but in exit Primary state,pg->want_acting will be cleared,So when in WaitActingChange wait AdvMap evt,the empty want_acting will lead pg cycle in this state.

thus, pg state will be out of this state until new acting is not equal now and restart peering.

Problem Description:
epoch 90: want_acting is[9, 8, 3], pg acting is[9 8],it would request a pg_temp [9,8,3] from mon. and it would post an NeedActingChange event.
epoch 91: osd.3 is down in osdmap;now the new acting will be [9,8].but when in state WaitActingChange resolve ActMap, in the log,pg->want_acting is empty,
it will stuck in this state until pg's acting set different from now.


Files

1.PNG (571 KB) 1.PNG request pg_temp,and then enter waitactingchange qiuzhang chen, 08/09/2019 03:40 PM
2.PNG (553 KB) 2.PNG cycle in waitactingchange,because want_acting is empty qiuzhang chen, 08/09/2019 03:40 PM
problem_log.tar.gz (815 KB) problem_log.tar.gz problem pg's log qiuzhang chen, 08/16/2019 04:51 AM

Related issues 4 (0 open4 closed)

Related to RADOS - Bug #40117: PG stuck in WaitActingChangeDuplicate

Actions
Copied to RADOS - Backport #45890: nautilus: osd: pg stuck in waitactingchange when new acting set doesn't changeResolvedNathan CutlerActions
Copied to RADOS - Backport #45891: luminous: osd: pg stuck in waitactingchange when new acting set doesn't changeRejectedNathan CutlerActions
Copied to RADOS - Backport #45892: mimic: osd: pg stuck in waitactingchange when new acting set doesn't changeRejectedNathan CutlerActions
Actions #1

Updated by qiuzhang chen over 4 years ago

pull request:OSD/PG: Fix pg stuck in waitactingchange #29580

Actions #2

Updated by qiuzhang chen over 4 years ago

PR: https://tracker.ceph.com/issues/40117 may have the same problem

Actions #4

Updated by qiuzhang chen over 4 years ago

the original log was too large,i filter the problem pg,please check it is enough to annalyze the problem.

Actions #5

Updated by Patrick Donnelly over 4 years ago

  • Tracker changed from Fix to Bug
  • Project changed from Ceph to RADOS
  • Subject changed from pg stuck in waitactingchange when new acting set doesn't change to osd: pg stuck in waitactingchange when new acting set doesn't change
  • Category deleted (OSD)
  • Status changed from New to Fix Under Review
  • Target version changed from v12.2.13 to v15.0.0
  • Start date deleted (08/09/2019)
  • Backport set to nautilus,mimic,luminous
  • Regression set to No
  • Severity set to 3 - minor
  • Pull request ID set to 29669
  • Affected Versions deleted (v12.2.1)
  • ceph-qa-suite deleted (rgw)
  • Component(RADOS) OSD added
Actions #6

Updated by Neha Ojha over 4 years ago

  • Related to Bug #40117: PG stuck in WaitActingChange added
Actions #7

Updated by Neha Ojha almost 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45890: nautilus: osd: pg stuck in waitactingchange when new acting set doesn't change added
Actions #9

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45891: luminous: osd: pg stuck in waitactingchange when new acting set doesn't change added
Actions #10

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45892: mimic: osd: pg stuck in waitactingchange when new acting set doesn't change added
Actions #11

Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF