Project

General

Profile

Actions

Bug #1079

closed

pgs stuck peering or degraded

Added by Josh Durgin almost 13 years ago. Updated almost 13 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Using the same setup as in #1073, but with 10 osds, the cluster recovered most pgs, but a few were stuck degraded, and some in the peering state:

joshd@vit:~/ceph/src [master]$ ./ceph -s
2011-05-10 14:43:45.852391   mds e7: 1/1/1 up {0=a=up:active}
2011-05-10 14:43:45.852582   osd e172: 10 osds: 10 up, 10 in
2011-05-10 14:43:45.852953   log 2011-05-10 14:28:52.564312 osd3 10.0.1.202:6809/16087 27 : [INF] 2.1p3 scrub ok
2011-05-10 14:43:45.853437   mon e1: 3 mons at {a=10.0.1.202:6789/0,b=10.0.1.202:6790/0,c=10.0.1.202:6791/0}
2011-05-10 14:43:45.979007    pg v773: 180 pgs: 174 active+clean, 4 peering, 2 active+degraded; 3928 MB data, 108 GB used, 8594 GB / 9168 GB avail; 45/2006 degraded (2.243%)

Logs, pg dump, and osd dump are in vit:/home/joshd/osd_bugs/stuck_degraded

Actions #1

Updated by Josh Durgin almost 13 years ago

It looks like the degraded ones are staying that way because they need backlogs, but we didn't populate peer_backlog_requested anywhere.

Actions #2

Updated by Sage Weil almost 13 years ago

  • Priority changed from Normal to High
  • Target version changed from v0.29 to v0.28
Actions #3

Updated by Samuel Just almost 13 years ago

The ones stuck in degraded were likely caused by the bug fixed in f1af92fb3d3bdab5a74ef40744028001d1943203.

Actions #4

Updated by Samuel Just almost 13 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF