Bug #2874
closedapparent CRUSH mapping failure
0%
Description
While doing crowbar tests, I created a 3-OSD cluster (on separate VMs) that ended up with 6 degraded PGs.
crowbar@d52-54-00-59-e5-54:~$ sudo ceph -s health HEALTH_WARN 6 pgs degraded; 6 pgs stuck unclean monmap e1: 1 mons at {d52-54-00-59-e5-54=192.168.124.81:6789/0}, election epoch 0, quorum 0 d52-54-00-59-e5-54 osdmap e20: 3 osds: 3 up, 3 in pgmap v120: 192 pgs: 186 active+clean, 6 active+degraded; 0 bytes data, 5657 MB used, 136 GB / 149 GB avail mdsmap e1: 0/0/1 up
Apparently, CRUSH is only mapping those 6 PGs to one osd:
crowbar@d52-54-00-59-e5-54:~$ osdmaptool osd.map --test-map-pg 0.7 osdmaptool: osdmap file 'osd.map' parsed '0.7' -> 0.7 0.7 raw [0] up [0] acting [0]
OSD map is attached.
Ceph version:
crowbar@d52-54-00-59-e5-54:~$ ceph --version ceph version 0.49-296-g4803800 (commit:480380025bcc7f9e6e6bd7c1b90815d6fb6ed9ce)
Files
Updated by Sage Weil almost 12 years ago
check if setting the tunables all to 0 makes it go away
Updated by Alex Moore over 11 years ago
I'd like to report that I was seeing what I believe to be the same issue (at least the symptoms were the same: a 3-OSD cluster that ended up with 6 degraded PGs (initially running 0.48, and no change with 0.49, or 0.50). Running the equivalent osdmaptool command as in Greg's problem description, but for my degraded PGs, similarly showed them mapping only to a single OSD. One slight difference was that I additionally had 9 remapped PGs.
Anyway I then set the tunables to the values suggested on this page: http://ceph.com/docs/master/ops/manage/crush/?highlight=tunables, based on Sage's comment, after which the problematic PGs all recovered (ie both the degraded and the remapped ones).
Updated by Sage Weil over 11 years ago
- Status changed from New to Resolved
Glad to hear the tunables resolved this for you, Alex!
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category deleted (
10) - Target version deleted (
v0.50)