Bug #2874: apparent CRUSH mapping failure - RADOS - Ceph

Actions

Copy link

Bug #2874

closed

apparent CRUSH mapping failure

Added by Greg Farnum almost 12 years ago. Updated almost 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

While doing crowbar tests, I created a 3-OSD cluster (on separate VMs) that ended up with 6 degraded PGs.

crowbar@d52-54-00-59-e5-54:~$ sudo ceph -s
   health HEALTH_WARN 6 pgs degraded; 6 pgs stuck unclean
   monmap e1: 1 mons at {d52-54-00-59-e5-54=192.168.124.81:6789/0}, election epoch 0, quorum 0 d52-54-00-59-e5-54
   osdmap e20: 3 osds: 3 up, 3 in
    pgmap v120: 192 pgs: 186 active+clean, 6 active+degraded; 0 bytes data, 5657 MB used, 136 GB / 149 GB avail
   mdsmap e1: 0/0/1 up

Apparently, CRUSH is only mapping those 6 PGs to one osd:

crowbar@d52-54-00-59-e5-54:~$ osdmaptool osd.map --test-map-pg 0.7
osdmaptool: osdmap file 'osd.map'
 parsed '0.7' -> 0.7
0.7 raw [0] up [0] acting [0]

OSD map is attached.
Ceph version:

crowbar@d52-54-00-59-e5-54:~$ ceph --version
ceph version 0.49-296-g4803800 (commit:480380025bcc7f9e6e6bd7c1b90815d6fb6ed9ce)

Files

osd.map (2.31 KB) osd.map

Greg Farnum, 07/31/2012 11:40 AM

Actions

Copy link

Updated by Sage Weil almost 12 years ago

check if setting the tunables all to 0 makes it go away

Actions

Copy link

Updated by Alex Moore over 11 years ago

I'd like to report that I was seeing what I believe to be the same issue (at least the symptoms were the same: a 3-OSD cluster that ended up with 6 degraded PGs (initially running 0.48, and no change with 0.49, or 0.50). Running the equivalent osdmaptool command as in Greg's problem description, but for my degraded PGs, similarly showed them mapping only to a single OSD. One slight difference was that I additionally had 9 remapped PGs.

Anyway I then set the tunables to the values suggested on this page: http://ceph.com/docs/master/ops/manage/crush/?highlight=tunables, based on Sage's comment, after which the problematic PGs all recovered (ie both the degraded and the remapped ones).

Actions

Copy link