Project

General

Profile

Actions

Bug #10250

closed

PG stuck incomplete after interrupted backfill.

Added by Aaron Bassett over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph version: 0.87
OS: Ubuntu 14.04
Cluster: 3x osd nodes with ~24 osds each

Issue: I had a pool accidentally set to 2. The SATA-DOM on one of my OSD nodes started to go, but I could still read off of it. I sacrificed one of my OSD disks (osd.0) (thinking I would be fine since my pools were at size 3), and moved the system onto it. Once I got the system back online and the cluster finished backfilling, I was left with several incomplete PGs. All but one of them are in a pool with test data that I will probably just remove once I get this sorted. There is one in my rdb image pool, however, that is causing heartache (pg.19.6e). Any requests for data cause block requests on its primary OSD (osd.70). The other osd (osd.4) it was on when it went down currently has about 4GB of data (when exported with ceph_objectstore_tool). I tried using ceph_objectstore_tool to move the data from 4 to 11 and now the pg is down+incomplete.

The pg query shows log_tail lagging behind last_complete so I think that may be the problem, but I don't know enough internals to be sure.

Also note, I've been flailing around quite a bit on this, so thats why the query has so many intervals.


Files

query (83.3 KB) query pg query from the afflicted pg Aaron Bassett, 12/05/2014 05:57 AM
osdlog (5.59 KB) osdlog log from current primary osd grepped for pg Aaron Bassett, 12/05/2014 05:57 AM
health (2.91 KB) health health detail Aaron Bassett, 12/05/2014 05:57 AM
osdlog_kicked.txt (3 KB) osdlog_kicked.txt Aaron Bassett, 12/08/2014 06:48 AM
osd.grepped.log (1.16 MB) osd.grepped.log Aaron Bassett, 12/08/2014 10:21 AM
Actions

Also available in: Atom PDF