Project

General

Profile

Bug #2874

apparent CRUSH mapping failure

Added by Greg Farnum over 11 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While doing crowbar tests, I created a 3-OSD cluster (on separate VMs) that ended up with 6 degraded PGs.

crowbar@d52-54-00-59-e5-54:~$ sudo ceph -s
   health HEALTH_WARN 6 pgs degraded; 6 pgs stuck unclean
   monmap e1: 1 mons at {d52-54-00-59-e5-54=192.168.124.81:6789/0}, election epoch 0, quorum 0 d52-54-00-59-e5-54
   osdmap e20: 3 osds: 3 up, 3 in
    pgmap v120: 192 pgs: 186 active+clean, 6 active+degraded; 0 bytes data, 5657 MB used, 136 GB / 149 GB avail
   mdsmap e1: 0/0/1 up

Apparently, CRUSH is only mapping those 6 PGs to one osd:

crowbar@d52-54-00-59-e5-54:~$ osdmaptool osd.map --test-map-pg 0.7
osdmaptool: osdmap file 'osd.map'
 parsed '0.7' -> 0.7
0.7 raw [0] up [0] acting [0]

OSD map is attached.
Ceph version:

crowbar@d52-54-00-59-e5-54:~$ ceph --version
ceph version 0.49-296-g4803800 (commit:480380025bcc7f9e6e6bd7c1b90815d6fb6ed9ce)

osd.map (2.31 KB) Greg Farnum, 07/31/2012 11:40 AM

History

#1 Updated by Sage Weil over 11 years ago

check if setting the tunables all to 0 makes it go away

#2 Updated by Alex Moore over 11 years ago

I'd like to report that I was seeing what I believe to be the same issue (at least the symptoms were the same: a 3-OSD cluster that ended up with 6 degraded PGs (initially running 0.48, and no change with 0.49, or 0.50). Running the equivalent osdmaptool command as in Greg's problem description, but for my degraded PGs, similarly showed them mapping only to a single OSD. One slight difference was that I additionally had 9 remapped PGs.

Anyway I then set the tunables to the values suggested on this page: http://ceph.com/docs/master/ops/manage/crush/?highlight=tunables, based on Sage's comment, after which the problematic PGs all recovered (ie both the degraded and the remapped ones).

#3 Updated by Sage Weil over 11 years ago

  • Status changed from New to Resolved

Glad to hear the tunables resolved this for you, Alex!

#4 Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (10)
  • Target version deleted (v0.50)

Also available in: Atom PDF