Bug #3806: OSDs stuck in active+degraded after changing replication from 2 to 3 - Ceph - Ceph

Actions

Copy link

Bug #3806

closed

OSDs stuck in active+degraded after changing replication from 2 to 3

Added by Ben Poliakoff over 11 years ago. Updated over 11 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Small 3 node cluster running 0.56.1-1~bpo60+1 on Debian/Squeeze, with "tuneables" enabled

I recently changed the replication level from 2 to 3. This modification appeared to go smoothly, but several days later I noticed 3 PGs lingering in "active+degraded" state. At this point the output of "ceph health detail" showed the PGs only mapped to two OSDs rather than three.

I then tried shutting down the OSD listed first in the output of "ceph health detail" and let the cluster rebuild. After that the 3 PGs changed state to "active+remapped".

Per request from joshd on IRC I'm attaching my osdmap and pg dump.

Files

Download all files

osdmap (4.45 KB) osdmap		Ben Poliakoff, 01/15/2013 02:17 PM
pg_dump.txt (236 KB) pg_dump.txt		Ben Poliakoff, 01/15/2013 02:17 PM
crush.txt (1.59 KB) crush.txt		Ben Poliakoff, 01/18/2013 03:45 PM

Actions

Copy link

Updated by Greg Farnum over 11 years ago

Haven't looked into this, but my guess is a couple PGs are getting unlucky with their replica selection. I assume you checked this, Josh?

Actions

Copy link

Updated by Josh Durgin over 11 years ago

Yes, the question is why they're 'getting unlucky'.

Actions

Copy link

Updated by Ben Poliakoff over 11 years ago

OK, it looks like I may have simply given CRUSH a challenging assignment, given the resources of the cluster.

I had set the replication level from 2 to 3, but my cluster is a very small testing setup -- three servers. Two of the three servers had three OSDs and the third had only one (I had resisted adding more OSDs to the third server as it's a little slower and has less RAM than the other two). This config worked reasonably well when replication was set to two.

At any rate, when I added two more OSDs to the third server (so that all three servers hosted the same number of OSDs) the three stuck PGs cleared up.

Perhaps this was more a case of a pathological config than a software bug.

Actions

Copy link

Updated by Greg Farnum over 11 years ago

@Josh Jones: Even with the new CRUSH tunables it's still a matter of probability, so if you give it a particularly challenging assignment it will fail some portion of the time.

However, I'm surprised this was fixed by adding OSDs without adding hosts. Ben, can you attach your decoded CRUSH map as well?

Actions

Copy link

Updated by Ben Poliakoff over 11 years ago

File crush.txt crush.txt added

Sure, it's attached...

Actions

Copy link

Updated by Greg Farnum over 11 years ago

Status changed from New to Won't Fix

Thanks. I was trying to figure out where the conflict could come from, and actually it does make sense: The single-osd host would have previously had a much lower weight than the other two, and so when trying to select 3 hosts a small number of PGs simply didn't select the smaller host, even after a very large number of retries.

Given that, I'm marking this as Won't Fix unless somebody thinks we need to examine it better.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #3806

OSDs stuck in active+degraded after changing replication from 2 to 3

Updated by Greg Farnum over 11 years ago

Updated by Josh Durgin over 11 years ago

Updated by Ben Poliakoff over 11 years ago

Updated by Greg Farnum over 11 years ago

Updated by Ben Poliakoff over 11 years ago

Updated by Greg Farnum over 11 years ago