Project

General

Profile

Actions

Bug #10411

closed

PG stuck incomplete after failed node

Added by Brian Rak over 9 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Yesterday, I was in the process of expanding the number of PGs in one of our pools. While I was doing this, one of the disks in an OSD failed (probably due to the high load of the cluster at that point). I removed this OSD from the pool, and let it rebuild, however I ended up with with 2 pgs stuck down and peering.

This is the relevant 'ceph health detail' output

pg 3.44c is stuck inactive since forever, current state down+peering, last acting [51,85]
pg 14.441 is stuck inactive since forever, current state down+peering, last acting [51,85]
pg 3.44c is stuck unclean since forever, current state down+peering, last acting [51,85]
pg 14.441 is stuck unclean since forever, current state down+peering, last acting [51,85]
pg 14.441 is down+peering, acting [51,85]
pg 3.44c is down+peering, acting [51,85]

I can't seem to figure out how to correct this. I've tried:

  • 'ceph osd out' both active OSDs, then putting them back in
  • ceph pg repair 3.44c
  • Restarting both OSDs (51, 85)
  • Restarting every OSD in the cluster
  • The patch from #10250 (I only installed this on the two relevant OSDs, did this need to be deployed cluster-wide?)

I've attached the debug log from one of the OSDs, passed through | grep 3.44c

Aside from the two nodes I upgraded, the rest of the cluster is v0.87

I can provide additional information if necessary, however I do not really want to post any information about the IP addresses of our nodes on a public bug tracker.

I'm on IRC as 'devicenull' if that would be any help of debugging this.


Files

3.44c (3.12 MB) 3.44c Brian Rak, 12/22/2014 07:52 AM
query (5.45 MB) query Brian Rak, 12/23/2014 08:20 AM
Actions

Also available in: Atom PDF