Bug #3785: ceph: default crush rule does not suit multi-OSD deployments - Ceph - Ceph

Actions

Copy link

Bug #3785

closed

ceph: default crush rule does not suit multi-OSD deployments

Added by Ian Colle over 11 years ago. Updated over 11 years ago.

Status:

Resolved

Priority:

High

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Spent time:

4:00 h

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Version: 0.48.2-0ubuntu2~cloud0

Our Ceph deployments typically involve multiple OSDs per host with no disk redundancy. However the default crush rules appears to distribute by OSD, not by host, which I believe will not prevent replicas from landing on the same host.

I've been working around this by updating the crush rules as follows and installing the resulting crushmap in the cluster, but since we aim for fully automated deployment (using Juju and MaaS) this is suboptimal.

--- crushmap.txt 2013-01-10 20:33:21.265809301 0000
++ crushmap.new 2013-01-10 20:32:49.496745778 0000
@ -104,7 +104,7 @
min_size 1
max_size 10
step take default
- step choose firstn 0 type osd
step chooseleaf firstn 0 type host
step emit
}
rule metadata {
@ -113,7 +113,7 @
min_size 1
max_size 10
step take default
- step choose firstn 0 type osd
+ step chooseleaf firstn 0 type host
step emit
}
rule rbd {
@ -122,7 +122,7 @
min_size 1
max_size 10
step take default
- step choose firstn 0 type osd
+ step chooseleaf firstn 0 type host
step emit
}

https://bugs.launchpad.net/cloud-archive/+bug/1098320

Actions

Copy link

Updated by Ian Colle over 11 years ago

Assignee set to Sage Weil
Priority changed from Normal to High

Actions

Copy link

Updated by Greg Farnum over 11 years ago

The issue here is that CRUSH maps which behave well on multi-host deployments behave quite poorly on one or two host deployments. The mkcephfs build path actually does handle this fairly politely, though, and I think (perhaps erroneously) that ceph-deploy is optimized for larger clusters.
Which deployment mechanism are you using?

Actions

Copy link

Updated by Anonymous over 11 years ago

I agree with Ian, I have seen very bad things happen when crush choses two OSD on one host, rather than distribute to different hosts.

It is nice to know that mkcephfs has a mechanism to balance the load so this won't happen. But this is a scalable product. Customers are suppose to use 'ceph osd add' to add more osd's to the cluster.

does 'ceph osd add' take into consideration crush host balancing when doing an add? Do we have instructions to manually handle that?

I think there should be a default rule that says the data replicas can not be written on the same host as the original. no matter how the OSD has been added.

just my 2cents... :-)

Actions

Copy link

Updated by Anonymous over 11 years ago

This comment should have been in bug 3789

upping the memory on these VMs from 512M to 2G

since it appears it was a resource problem, i will close this bug.

do we have any mechanism that I am missing that notifies the end user when crashes like this occur? So they can go in and fix their cluster before there are a critical number of resources that have failed?

Actions

Copy link

Updated by Anonymous over 11 years ago

Status changed from New to Won't Fix

This comment should have been in bug 3789

caused by a lack of resources on the system.
have increased the memory from 512M to 2G, will retest.

Actions

Copy link

Status changed from Fix Under Review to Resolved

Merged to master in 7ea5d84fa3d0ed3db61eea7eb9fa8dbee53244b6 and cherry-picked to bobtail in commit:503917f0049d297218b1247dc0793980c39195b3.

Actions

Copy link

#12

Updated by Sage Weil over 11 years ago

Status changed from Resolved to Fix Under Review

der, broke vstart. can you review wip-3785?

Actions

Copy link

#13

Updated by Greg Farnum over 11 years ago

sigh

This also looks good to me, and I like it better (should have suggested this the first time around). But now I've gotten scared again; have you run this outside of vstart? :)

Actions

Copy link

#14

Updated by Sage Weil over 11 years ago

Nope.. which leads me to realize that that setting needs to go in teuthology's ceph.conf. Doing that now, and then I'll run it through the suite.

Actions

Copy link

#15

Updated by Sage Weil over 11 years ago

Status changed from Fix Under Review to Resolved

commit:f358cb1d2b0a3a78bf59c4fd085906fcb5541bbe

Actions

Copy link

#16

Updated by Greg Farnum over 11 years ago

I presume we're planning to backport this to bobtail after it passes some nights of testing? Maybe we should leave the bug in "testing" until then (or we get our "Needs Backport" status!).

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #3785

ceph: default crush rule does not suit multi-OSD deployments

Updated by Ian Colle over 11 years ago

Updated by Greg Farnum over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Dan Mick over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Greg Farnum over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Greg Farnum over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Greg Farnum over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Greg Farnum over 11 years ago