Project

General

Profile

Actions

Bug #3806

closed

OSDs stuck in active+degraded after changing replication from 2 to 3

Added by Ben Poliakoff over 11 years ago. Updated over 11 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Small 3 node cluster running 0.56.1-1~bpo60+1 on Debian/Squeeze, with "tuneables" enabled

I recently changed the replication level from 2 to 3. This modification appeared to go smoothly, but several days later I noticed 3 PGs lingering in "active+degraded" state. At this point the output of "ceph health detail" showed the PGs only mapped to two OSDs rather than three.

I then tried shutting down the OSD listed first in the output of "ceph health detail" and let the cluster rebuild. After that the 3 PGs changed state to "active+remapped".

Per request from joshd on IRC I'm attaching my osdmap and pg dump.


Files

osdmap (4.45 KB) osdmap Ben Poliakoff, 01/15/2013 02:17 PM
pg_dump.txt (236 KB) pg_dump.txt Ben Poliakoff, 01/15/2013 02:17 PM
crush.txt (1.59 KB) crush.txt Ben Poliakoff, 01/18/2013 03:45 PM
Actions #1

Updated by Greg Farnum over 11 years ago

Haven't looked into this, but my guess is a couple PGs are getting unlucky with their replica selection. I assume you checked this, Josh?

Actions #2

Updated by Josh Durgin over 11 years ago

Yes, the question is why they're 'getting unlucky'.

Actions #3

Updated by Ben Poliakoff over 11 years ago

OK, it looks like I may have simply given CRUSH a challenging assignment, given the resources of the cluster.

I had set the replication level from 2 to 3, but my cluster is a very small testing setup -- three servers. Two of the three servers had three OSDs and the third had only one (I had resisted adding more OSDs to the third server as it's a little slower and has less RAM than the other two). This config worked reasonably well when replication was set to two.

At any rate, when I added two more OSDs to the third server (so that all three servers hosted the same number of OSDs) the three stuck PGs cleared up.

Perhaps this was more a case of a pathological config than a software bug.

Actions #4

Updated by Greg Farnum over 11 years ago

@Josh Jones: Even with the new CRUSH tunables it's still a matter of probability, so if you give it a particularly challenging assignment it will fail some portion of the time.

However, I'm surprised this was fixed by adding OSDs without adding hosts. Ben, can you attach your decoded CRUSH map as well?

Actions #5

Updated by Ben Poliakoff over 11 years ago

Sure, it's attached...

Actions #6

Updated by Greg Farnum over 11 years ago

  • Status changed from New to Won't Fix

Thanks. I was trying to figure out where the conflict could come from, and actually it does make sense: The single-osd host would have previously had a much lower weight than the other two, and so when trying to select 3 hosts a small number of PGs simply didn't select the smaller host, even after a very large number of retries.

Given that, I'm marking this as Won't Fix unless somebody thinks we need to examine it better.

Actions

Also available in: Atom PDF