Project

General

Profile

Actions

Bug #3905

closed

incomplete & stale (lost?) PGs

Added by Faidon Liambotis over 11 years ago. Updated over 11 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I added a bunch of new OSDs into my Ceph cluster (0.56.1 on Ubuntu 12.04 LTS) about 72h ago. Simultaneously, I marked most of the old OSDs as "out", as I want to completely replace the hardware of my Ceph cluster.

Until today, the recovery process was running well. Then at some point, random OSDs started being marked as down and up again -- this may or may not have to do with #3904 which was observed at the time. Some of them were complaining for op_tp heartbeat which was set at 7200 and then increased to 28800. After a few hours (4-5) the cluster stabilized again, with all the OSDs being marked up.

However, I now see 1 incomplete and 22 stale PGs:

2013-01-24 02:16:39.900827 mon.0 [INF] pgmap v1780945: 16952 pgs: 13 active, 7729 active+clean, 7050 active+remapped+wait_backfill, 79 active+degraded+wait_backfill, 3 peering, 866 active+remapped, 300 active+remapped+backfilling, 288 active+degraded, 6 active+degraded+backfilling, 517 active+degraded+remapped+wait_backfill, 39 stale+active+remapped, 7 active+recovery_wait+remapped, 4 remapped+peering, 1 incomplete, 27 active+degraded+remapped+backfilling, 7 stale+remapped+peering, 16 stale+active+degraded+remapped; 25005 GB data, 54874 GB used, 184 TB / 238 TB avail; 28083195/149909025 degraded (18.733%)

Attached are pg dump, osd dump, osd map, crushmap and pg query for the incomplete PG. Query on the stale PGs results in "pgid currently maps to no osd" which is a bit worrying...

Note there was no read or write traffic to the cluster during recovery and there is none now -- we've left it alone to quietly recover to the new hardware, but it seems it wasn't enough :)


Files

pgquery-3.27d9 (33.9 KB) pgquery-3.27d9 Faidon Liambotis, 01/23/2013 06:34 PM
pgdump (3.38 MB) pgdump Faidon Liambotis, 01/23/2013 06:34 PM
osddump (252 KB) osddump Faidon Liambotis, 01/23/2013 06:34 PM
osdmap (354 KB) osdmap Faidon Liambotis, 01/23/2013 06:34 PM
crushmap (8.55 KB) crushmap Faidon Liambotis, 01/23/2013 06:34 PM
osdtree (3.79 KB) osdtree Faidon Liambotis, 01/24/2013 11:18 AM
Actions

Also available in: Atom PDF