Project

General

Profile

Bug #41190

osd: pg stuck in waitactingchange when new acting set doesn't change

Added by qiuzhang chen 16 days ago. Updated 6 days ago.

Status:
Need Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus,mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:

Description

In pg GetLog state,when process choose_acting, if want not equal acting, it will request pg_temp from mon,and then post an event NeedActingChange.

before transit to WaitActingchange, pg state will exit from getlog->peering->primary,and then enter Started/Primary/Peering/WaitActingChange.

but in exit Primary state,pg->want_acting will be cleared,So when in WaitActingChange wait AdvMap evt,the empty want_acting will lead pg cycle in this state.

thus, pg state will be out of this state until new acting is not equal now and restart peering.

Problem Description:
epoch 90: want_acting is[9, 8, 3], pg acting is[9 8],it would request a pg_temp [9,8,3] from mon. and it would post an NeedActingChange event.
epoch 91: osd.3 is down in osdmap;now the new acting will be [9,8].but when in state WaitActingChange resolve ActMap, in the log,pg->want_acting is empty,
it will stuck in this state until pg's acting set different from now.

1.PNG View - request pg_temp,and then enter waitactingchange (571 KB) qiuzhang chen, 08/09/2019 03:40 PM

2.PNG View - cycle in waitactingchange,because want_acting is empty (553 KB) qiuzhang chen, 08/09/2019 03:40 PM

problem_log.tar.gz - problem pg's log (815 KB) qiuzhang chen, 08/16/2019 04:51 AM


Related issues

Related to RADOS - Bug #40117: PG stuck in WaitActingChange Need Review 06/03/2019

History

#1 Updated by qiuzhang chen 16 days ago

pull request:OSD/PG: Fix pg stuck in waitactingchange #29580

#2 Updated by qiuzhang chen 16 days ago

PR: https://tracker.ceph.com/issues/40117 may have the same problem

#4 Updated by qiuzhang chen 9 days ago

the original log was too large,i filter the problem pg,please check it is enough to annalyze the problem.

#5 Updated by Patrick Donnelly 6 days ago

  • Tracker changed from Fix to Bug
  • Project changed from Ceph to RADOS
  • Subject changed from pg stuck in waitactingchange when new acting set doesn't change to osd: pg stuck in waitactingchange when new acting set doesn't change
  • Category deleted (OSD)
  • Status changed from New to Need Review
  • Target version changed from v12.2.13 to v15.0.0
  • Start date deleted (08/09/2019)
  • Backport set to nautilus,mimic,luminous
  • Regression set to No
  • Severity set to 3 - minor
  • Pull request ID set to 29669
  • Affected Versions deleted (v12.2.1)
  • ceph-qa-suite deleted (rgw)
  • Component(RADOS) OSD added

#6 Updated by Neha Ojha 5 days ago

  • Related to Bug #40117: PG stuck in WaitActingChange added

Also available in: Atom PDF