Project

General

Profile

Actions

Bug #6565

closed

stuck in recovery_wait

Added by Samuel Just over 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-10-15_11:06:15-rados-next-testing-basic-plana/53130

Actions #1

Updated by Samuel Just over 10 years ago

2013-10-15 15:14:21.293917 7f2dbbe7f700 5 osd.1 pg_epoch: 753 pg[1.30( v 6'4 (0'0,6'4] local-les=753 n=4 ec=1 les/c 749/749 751/751/391) [1,0] r=0 lpr=751 pi=747-750/1 lcod 0'0 mlcod 0'0 active+recovery_wait] exit Started/Primary/Active/WaitLocalRecoveryReserved 0.000132 1 0.000042
...
2013-10-15 15:14:21.293951 7f2dbbe7f700 5 osd.1 pg_epoch: 753 pg[1.30( v 6'4 (0'0,6'4] local-les=753 n=4 ec=1 les/c 749/749 751/751/391) [1,0] r=0 lpr=751 pi=747-750/1 lcod 0'0 mlcod 0'0 active+recovery_wait] enter Started/Primary/Active/WaitRemoteRecoveryReserved
...
2013-10-15 15:14:21.293971 7f2dbbe7f700 1 -- 10.214.132.4:6806/27533 --> 10.214.132.4:6815/13725 -- MRecoveryReserve REQUEST pgid: 1.30, query_epoch: 753 v1 -- ?+0 0x3157380 con 0x3efeb00
...
2013-10-15 15:14:21.294733 7fd0b05b4700 5 osd.0 pg_epoch: 747 pg[1.30(unlocked)] enter Initial
...
2013-10-15 15:14:21.295817 7fd0b5dbf700 1 -- 10.214.132.4:6815/13725 <== osd.1 10.214.132.4:6806/27533 398 ==== MRecoveryReserve REQUEST pgid: 1.30, query_epoch: 753 v1 ==== 25+0+0 (1153642428 0 0) 0x2423000 con 0x1eb2000
...
2013-10-15 15:14:23.297955 7fd0b9ce7700 5 osd.0 pg_epoch: 747 pg[1.30( empty local-les=0 n=0 ec=1 les/c 734/738 744/747/391) [1,5] r=-1 lpr=0 pi=727-746/3 inactive NOTIFY] exit Initial 2.003222 0 0.000000
...
2013-10-15 15:14:23.297980 7fd0b9ce7700 5 osd.0 pg_epoch: 747 pg[1.30( empty local-les=0 n=0 ec=1 les/c 734/738 744/747/391) [1,5] r=-1 lpr=0 pi=727-746/3 inactive NOTIFY] enter Reset
...
2013-10-15 15:14:24.684991 7fd0afdb3700 5 osd.0 pg_epoch: 755 pg[1.30( empty local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] exit Reset 1.387011 9 0.000080
2013-10-15 15:14:24.685015 7fd0afdb3700 5 osd.0 pg_epoch: 755 pg[1.30( empty local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] enter Started
2013-10-15 15:14:24.685026 7fd0afdb3700 5 osd.0 pg_epoch: 755 pg[1.30( empty local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] enter Start
...
2013-10-15 15:14:24.685035 7fd0afdb3700 1 osd.0 pg_epoch: 755 pg[1.30( empty local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] state<Start>: transitioning to Stray
...
2013-10-15 15:14:24.685045 7fd0afdb3700 5 osd.0 pg_epoch: 755 pg[1.30( empty local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] exit Start 0.000019 0 0.000000
...
2013-10-15 15:14:24.685057 7fd0afdb3700 5 osd.0 pg_epoch: 755 pg[1.30( empty local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] enter Started/Stray
...
2013-10-15 15:14:24.822810 7fd0afdb3700 5 osd.0 pg_epoch: 755 pg[1.30( v 6'4 lc 0'0 (0'0,6'4] lb 0//0//-1 local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] exit Started/Stray 0.137753 2 0.000085
2013-10-15 15:14:24.822851 7fd0afdb3700 5 osd.0 pg_epoch: 755 pg[1.30( v 6'4 lc 0'0 (0'0,6'4] lb 0//0//-1 local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] enter Started/ReplicaActive
2013-10-15 15:14:24.822876 7fd0afdb3700 5 osd.0 pg_epoch: 755 pg[1.30( v 6'4 lc 0'0 (0'0,6'4] lb 0//0//-1 local-les=0 n=0 ec=1 les/c 734/738 751/751/391) [1,0] r=1 lpr=751 pi=727-750/4 inactive NOTIFY] enter Started/ReplicaActive/RepNotRecovering

Actions #2

Updated by Samuel Just over 10 years ago

For Recovery and Backfill reservations, we don't check whether the target pg is currently splitting. Working on patch now.

Actions #3

Updated by Samuel Just over 10 years ago

  • Status changed from New to Resolved

15ec5332ba4154930a0447e2bcf1acec02691e97

Actions #4

Updated by Samuel Just almost 10 years ago

  • Status changed from Resolved to Pending Backport
  • Priority changed from Urgent to High

saw in dumpling, not sure if we should bother backporting it

Actions #5

Updated by Sage Weil almost 10 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF