Project

General

Profile

Actions

Bug #3785

closed

ceph: default crush rule does not suit multi-OSD deployments

Added by Ian Colle over 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Spent time:
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Version: 0.48.2-0ubuntu2~cloud0

Our Ceph deployments typically involve multiple OSDs per host with no disk redundancy. However the default crush rules appears to distribute by OSD, not by host, which I believe will not prevent replicas from landing on the same host.

I've been working around this by updating the crush rules as follows and installing the resulting crushmap in the cluster, but since we aim for fully automated deployment (using Juju and MaaS) this is suboptimal.

--- crushmap.txt 2013-01-10 20:33:21.265809301 0000
++ crushmap.new 2013-01-10 20:32:49.496745778 0000
@ -104,7 +104,7 @
min_size 1
max_size 10
step take default
- step choose firstn 0 type osd
step chooseleaf firstn 0 type host
step emit
}
rule metadata {
@ -113,7 +113,7 @
min_size 1
max_size 10
step take default
- step choose firstn 0 type osd
+ step chooseleaf firstn 0 type host
step emit
}
rule rbd {
@ -122,7 +122,7 @
min_size 1
max_size 10
step take default
- step choose firstn 0 type osd
+ step chooseleaf firstn 0 type host
step emit
}

https://bugs.launchpad.net/cloud-archive/+bug/1098320

Actions #1

Updated by Ian Colle over 11 years ago

  • Assignee set to Sage Weil
  • Priority changed from Normal to High
Actions #2

Updated by Greg Farnum over 11 years ago

The issue here is that CRUSH maps which behave well on multi-host deployments behave quite poorly on one or two host deployments. The mkcephfs build path actually does handle this fairly politely, though, and I think (perhaps erroneously) that ceph-deploy is optimized for larger clusters.
Which deployment mechanism are you using?

Actions #3

Updated by Anonymous over 11 years ago

I agree with Ian, I have seen very bad things happen when crush choses two OSD on one host, rather than distribute to different hosts.

It is nice to know that mkcephfs has a mechanism to balance the load so this won't happen. But this is a scalable product. Customers are suppose to use 'ceph osd add' to add more osd's to the cluster.

does 'ceph osd add' take into consideration crush host balancing when doing an add? Do we have instructions to manually handle that?

I think there should be a default rule that says the data replicas can not be written on the same host as the original. no matter how the OSD has been added.

just my 2cents... :-)

Actions #4

Updated by Anonymous over 11 years ago

This comment should have been in bug 3789

upping the memory on these VMs from 512M to 2G

since it appears it was a resource problem, i will close this bug.

do we have any mechanism that I am missing that notifies the end user when crashes like this occur? So they can go in and fix their cluster before there are a critical number of resources that have failed?

Actions #5

Updated by Anonymous over 11 years ago

  • Status changed from New to Won't Fix

This comment should have been in bug 3789

caused by a lack of resources on the system.
have increased the memory from 512M to 2G, will retest.

Actions #6

Updated by Dan Mick over 11 years ago

I think maybe Deb's comments and closure were meant for another bug (perhaps 3789?)

Actions #7

Updated by Anonymous over 11 years ago

  • Status changed from Won't Fix to New

dang! wrong bug. opening this one back up.
sorry all!

Actions #8

Updated by Sage Weil over 11 years ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Sage Weil to Greg Farnum

wip-3785

Actions #9

Updated by Greg Farnum over 11 years ago

Looks good to me. What branches do we want to cherry-pick it on.

Actions #10

Updated by Sage Weil over 11 years ago

good question. let's start with bobtail.

Actions #11

Updated by Greg Farnum over 11 years ago

  • Status changed from Fix Under Review to Resolved

Merged to master in 7ea5d84fa3d0ed3db61eea7eb9fa8dbee53244b6 and cherry-picked to bobtail in commit:503917f0049d297218b1247dc0793980c39195b3.

Actions #12

Updated by Sage Weil over 11 years ago

  • Status changed from Resolved to Fix Under Review

der, broke vstart. can you review wip-3785?

Actions #13

Updated by Greg Farnum over 11 years ago

sigh

This also looks good to me, and I like it better (should have suggested this the first time around). But now I've gotten scared again; have you run this outside of vstart? :)

Actions #14

Updated by Sage Weil over 11 years ago

Nope.. which leads me to realize that that setting needs to go in teuthology's ceph.conf. Doing that now, and then I'll run it through the suite.

Actions #15

Updated by Sage Weil about 11 years ago

  • Status changed from Fix Under Review to Resolved

commit:f358cb1d2b0a3a78bf59c4fd085906fcb5541bbe

Actions #16

Updated by Greg Farnum about 11 years ago

I presume we're planning to backport this to bobtail after it passes some nights of testing? Maybe we should leave the bug in "testing" until then (or we get our "Needs Backport" status!).

Actions

Also available in: Atom PDF