Actions
Feature #8141
openNice if we had a state for when a pg can't recover because all missing objects are unfound and we can't make progress
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:
Description
I put a pg into the following state by taking down 2 OSDs at just the right time after peering but before recovery completed. There are 30 missing objects that are also unfound. The pg is stuck in recovering even though it isn't actively doing anything. Maybe it could be marked degraded.
cluster 01520da2-e482-45ec-b6c5-3778658674d1 health HEALTH_WARN 24 pgs degraded; 1 pgs recovering; 25 pgs stuck unclean; recovery 152/770 objects degraded (19.740%); 30/359 unfound (8.357%) monmap e1: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}, election epoch 36, quorum 0,1,2 a,b,c mdsmap e44: 3/3/3 up {0=c=up:active,1=b=up:active,2=a=up:active} osdmap e78: 4 osds: 2 up, 2 in pgmap v690: 32 pgs, 4 pools, 1220 MB data, 359 objects 29860 MB used, 8921 MB / 40829 MB avail 152/770 objects degraded (19.740%); 30/359 unfound (8.357%) 24 active+degraded 7 active+clean 1 active+recovering 3.5 40 30 100 30 163577868 40 40 active+recovering 2014-04-17 14:49:56.903242 40'40 78:322 [1,0] 1 [1,0] 1 0'0 2014-04-17 14:13:16.740899 0'0 2014-04-17 14:13:16.740899 2014-04-17 14:50:06.377415 7f8ac5603700 10 osd.1 pg_epoch: 78 pg[3.5( v 40'40 lc 40'10 (0'0,40'40] local-les=78 n=40 ec=37 les/c 78/72 76/76/76) [1,0] r=0 lpr=78 pi=63-75/4 crt=40'40 mlcod 0'0 active+recovering m=30 u=30] start_recovery_ops missing_loc: {} 2014-04-17 14:50:06.377451 7f8ac5603700 10 osd.1 pg_epoch: 78 pg[3.5( v 40'40 lc 40'10 (0'0,40'40] local-les=78 n=40 ec=37 les/c 78/72 76/76/76) [1,0] r=0 lpr=78 pi=63-75/4 crt=40'40 mlcod 0'0 active+recovering m=30 u=30] still have 30 unfound 2014-04-17 14:50:06.377476 7f8ac5603700 10 osd.1 78 do_recovery started 0/5 on pg[3.5( v 40'40 lc 40'10 (0'0,40'40] local-les=78 n=40 ec=37 les/c 78/72 76/76/76) [1,0] r=0 lpr=78 pi=63-75/4 crt=40'40 mlcod 0'0 active+recovering m=30 u=30] 2014-04-17 14:50:06.377521 7f8ac5603700 10 osd.1 pg_epoch: 78 pg[3.5( v 40'40 lc 40'10 (0'0,40'40] local-les=78 n=40 ec=37 les/c 78/72 76/76/76) [1,0] r=0 lpr=78 pi=63-75/4 crt=40'40 mlcod 0'0 active+recovering m=30 u=30] discover_all_missing 30 missing, 30 unfound 2014-04-17 14:50:06.377559 7f8ac5603700 20 osd.1 pg_epoch: 78 pg[3.5( v 40'40 lc 40'10 (0'0,40'40] local-les=78 n=40 ec=37 les/c 78/72 76/76/76) [1,0] r=0 lpr=78 pi=63-75/4 crt=40'40 mlcod 0'0 active+recovering m=30 u=30] discover_all_missing: osd.0: we already have pg_missing_t 2014-04-17 14:50:06.377584 7f8ac5603700 20 osd.1 pg_epoch: 78 pg[3.5( v 40'40 lc 40'10 (0'0,40'40] local-les=78 n=40 ec=37 les/c 78/72 76/76/76) [1,0] r=0 lpr=78 pi=63-75/4 crt=40'40 mlcod 0'0 active+recovering m=30 u=30] discover_all_missing skipping down osd.2 2014-04-17 14:50:06.377608 7f8ac5603700 20 osd.1 pg_epoch: 78 pg[3.5( v 40'40 lc 40'10 (0'0,40'40] local-les=78 n=40 ec=37 les/c 78/72 76/76/76) [1,0] r=0 lpr=78 pi=63-75/4 crt=40'40 mlcod 0'0 active+recovering m=30 u=30] discover_all_missing skipping down osd.3 2014-04-17 14:50:06.377632 7f8ac5603700 10 osd.1 78 do_recovery no luck, giving up on this pg for now 2014-04-17 14:50:06.377636 7f8ac5603700 10 log is not dirty
Updated by Samuel Just almost 10 years ago
- Tracker changed from Bug to Feature
- Target version set to 0.82
Updated by Patrick Donnelly almost 5 years ago
- Project changed from Ceph to RADOS
Actions