Project

General

Profile

Actions

Bug #39402

open

Can't remove ghost PGs

Added by David Galloway about 5 years ago. Updated almost 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is on the downstream long-running cluster. I can grant SSH access to whomever needs it.

This bug is similar to http://tracker.ceph.com/issues/10411 except I can't perform the troubleshooting steps because most of the OSDs are bluestore.

All the OSDs on reesi004 were lost due to the journals being reused but not zapped prior to re-adding them. Alfredo created a ticket to fix that.

[root@reesi001 ~]# ceph health detail
HEALTH_ERR noout flag(s) set; 158255/11903128 objects misplaced (1.330%); Reduced data availability: 8 pgs inactive; 2 slow requests are blocked > 32 sec. Implicated osds ; 5 stuck requests are blocked > 4096 sec. Implicated osds 1,3,25,72,85
OSDMAP_FLAGS noout flag(s) set
OBJECT_MISPLACED 158255/11903128 objects misplaced (1.330%)
PG_AVAILABILITY Reduced data availability: 8 pgs inactive
    pg 1.83 is stuck inactive for 4019.820958, current state unknown, last acting []
    pg 1.d2 is stuck inactive for 4019.820958, current state unknown, last acting []
    pg 1.11b is stuck inactive for 4019.820958, current state unknown, last acting []
    pg 1.15e is stuck inactive for 4019.820958, current state unknown, last acting []
    pg 1.169 is stuck inactive for 4019.820958, current state unknown, last acting []
    pg 1.173 is stuck inactive for 4019.820958, current state unknown, last acting []
    pg 1.1a0 is stuck inactive for 4019.820958, current state unknown, last acting []
    pg 1.1ed is stuck inactive for 4019.820958, current state unknown, last acting []
REQUEST_SLOW 2 slow requests are blocked > 32 sec. Implicated osds 
    2 ops are blocked > 2097.15 sec
REQUEST_STUCK 5 stuck requests are blocked > 4096 sec. Implicated osds 1,3,25,72,85
    5 ops are blocked > 4194.3 sec
    osds 1,3,25,72,85 have stuck requests > 4194.3 sec

[root@reesi001 ~]# ceph pg 1.83 query
Error ENOENT: i don't have pgid 1.83

[root@reesi001 ~]# ceph pg 1.83 mark_unfound_lost delete
Error ENOENT: i don't have pgid 1.83

[root@reesi001 ~]# ceph pg force_create_pg 1.83
Error ENOTSUP: this command is obsolete

I'm hoping the blocked requests are related.

Actions #1

Updated by Greg Farnum almost 5 years ago

  • Project changed from Ceph to RADOS
Actions

Also available in: Atom PDF