Project

General

Profile

Bug #10713

Unable to access some pgs

Added by Sébastien CARRIERE about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have been facing some network instability on my ceph cluster. (0.87)

  1. ceph -s ###########################################
    cluster b518ffdd-8cfa-44f7-bd3a-ab8a4dc23cfa
    health HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck degraded; 1 pgs stuck unclean; recovery 6/20684238 objects degraded (0.000%); 12630/20684238 objects misplaced (0.061%); 2/6892641 unfound (0.000%); 2 near full osd(s); pool vms has too few pgs
    monmap e17: 5 mons at {storage1=10.80.5.1:6789/0,storage2=10.80.5.2:6789/0,storage3=10.80.5.3:6789/0,storage4=10.80.5.4:6789/0,storage5=10.80.5.5:6789/0}, election epoch 1342, quorum 0,1,2,3,4 storage1,storage2,storage3,storage4,storage5
    osdmap e16563: 63 osds: 63 up, 61 in
    pgmap v14168648: 53248 pgs, 14 pools, 13617 GB data, 6731 kobjects
    40532 GB used, 15531 GB / 56064 GB avail
    6/20684238 objects degraded (0.000%); 12630/20684238 objects misplaced (0.061%); 2/6892641 unfound (0.000%)
    53247 active+clean
    1 active+recovering+degraded+remapped
    client io 123 MB/s rd, 2164 kB/s wr, 2132 op/s

###################################################################

'Not found objects' are located in a pool which contained some virtual machines images.
doing some investigation, I found the problematic pg (5.412)

The command below is successful

  1. ceph pg map 5.412 ###########################
    osdmap e16563 pg 5.412 (5.412) -> up [19,14,25] acting [31,39,52] ###################################################################
And scrub query seems also to be working
  1. ceph pg scrub 5.412 ###########################
    instructing pg 5.412 on osd.31 to scrub ###################################################################

BUT,
the below command freeze (must CTRL+C to cancel it) and it seems i cannot act on this pg.

  1. ceph pg 5.412 mark_unfound_lost revert ###########################
    ^CError EINTR: problem getting command descriptions from pg.5.412 ########################################################################################

So i'm stucked with this pool i can not recover.

At the end of the week,i plan to simply delete this pool, so that my cluster will recover its healthy state..
But, could anyone confirm me it's going to work?

History

#1 Updated by Samuel Just about 9 years ago

  • Status changed from New to Closed

Yep, deleting the pool will fix it.

Also available in: Atom PDF