Bug #3806
closedOSDs stuck in active+degraded after changing replication from 2 to 3
0%
Description
Small 3 node cluster running 0.56.1-1~bpo60+1 on Debian/Squeeze, with "tuneables" enabled
I recently changed the replication level from 2 to 3. This modification appeared to go smoothly, but several days later I noticed 3 PGs lingering in "active+degraded" state. At this point the output of "ceph health detail" showed the PGs only mapped to two OSDs rather than three.
I then tried shutting down the OSD listed first in the output of "ceph health detail" and let the cluster rebuild. After that the 3 PGs changed state to "active+remapped".
Per request from joshd on IRC I'm attaching my osdmap and pg dump.
Files
Updated by Greg Farnum over 11 years ago
Haven't looked into this, but my guess is a couple PGs are getting unlucky with their replica selection. I assume you checked this, Josh?
Updated by Josh Durgin over 11 years ago
Yes, the question is why they're 'getting unlucky'.
Updated by Ben Poliakoff over 11 years ago
OK, it looks like I may have simply given CRUSH a challenging assignment, given the resources of the cluster.
I had set the replication level from 2 to 3, but my cluster is a very small testing setup -- three servers. Two of the three servers had three OSDs and the third had only one (I had resisted adding more OSDs to the third server as it's a little slower and has less RAM than the other two). This config worked reasonably well when replication was set to two.
At any rate, when I added two more OSDs to the third server (so that all three servers hosted the same number of OSDs) the three stuck PGs cleared up.
Perhaps this was more a case of a pathological config than a software bug.
Updated by Greg Farnum over 11 years ago
@Josh Jones: Even with the new CRUSH tunables it's still a matter of probability, so if you give it a particularly challenging assignment it will fail some portion of the time.
However, I'm surprised this was fixed by adding OSDs without adding hosts. Ben, can you attach your decoded CRUSH map as well?
Updated by Ben Poliakoff over 11 years ago
Sure, it's attached...
Updated by Greg Farnum over 11 years ago
- Status changed from New to Won't Fix
Thanks. I was trying to figure out where the conflict could come from, and actually it does make sense: The single-osd host would have previously had a much lower weight than the other two, and so when trying to select 3 hosts a small number of PGs simply didn't select the smaller host, even after a very large number of retries.
Given that, I'm marking this as Won't Fix unless somebody thinks we need to examine it better.