Project

General

Profile

Bug #8601

erasure-code: default profile does not exist after upgrade

Added by Loïc Dachary almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
% Done:

100%

Source:
other
Tags:
Backport:
firefly
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Workaround

Create the default profile, after upgrading to firefly, with

$ ceph osd erasure-code-profile set default

And verify it is as expected with
$ ceph osd erasure-code-profile get default
directory=.libs
k=2
m=1
plugin=jerasure
ruleset-failure-domain=osd
technique=reed_sol_van

Description

When a firefly cluster is created, it the default erasure code profile is created . When an existing cluster is upgraded, the default erasure code profile is not created.

  • The upgrade notes could be modified to document this
  • The default erasure code profile could be created as a side effect of osd pool create if it is not found

Related issues

Related to Ceph - Bug #8599: Fix check of ruleset id on pool update Resolved 06/14/2014

Associated revisions

Revision 4e1405e7 (diff)
Added by Loïc Dachary over 9 years ago

erasure-code: create default profile if necessary

After an upgrade to firefly, the existing Ceph clusters do not have the
default erasure code profile. Although it may be created with

ceph osd erasure-code-profile set default

it was not included in the release notes and is confusing for the
administrator.

The osd pool create and osd crush rule create-erasure commands are
modified to implicitly create the default erasure code profile if it is
not found.

In order to avoid code duplication, the default erasure code profile
code creation that happens when a new firefly ceph cluster is created is
encapsulated in the OSDMap::get_erasure_code_profile_default method.

Conversely, handling the pending change in OSDMonitor is not
encapsulated in a function but duplicated instead. If it was a function
the caller would need a switch to distinguish between the case when goto
wait is needed, or goto reply or proceed because nothing needs to be
done. It is unclear if having a function would lead to smaller or more
maintainable code.

http://tracker.ceph.com/issues/8601 Fixes: #8601

Backport: firefly
Signed-off-by: Loic Dachary <>

Revision b6d8feab (diff)
Added by Loïc Dachary over 9 years ago

erasure-code: create default profile if necessary

After an upgrade to firefly, the existing Ceph clusters do not have the
default erasure code profile. Although it may be created with

ceph osd erasure-code-profile set default

it was not included in the release notes and is confusing for the
administrator.

The osd pool create and osd crush rule create-erasure commands are
modified to implicitly create the default erasure code profile if it is
not found.

In order to avoid code duplication, the default erasure code profile
code creation that happens when a new firefly ceph cluster is created is
encapsulated in the OSDMap::get_erasure_code_profile_default method.

Conversely, handling the pending change in OSDMonitor is not
encapsulated in a function but duplicated instead. If it was a function
the caller would need a switch to distinguish between the case when goto
wait is needed, or goto reply or proceed because nothing needs to be
done. It is unclear if having a function would lead to smaller or more
maintainable code.

http://tracker.ceph.com/issues/8601 Fixes: #8601

Backport: firefly
Signed-off-by: Loic Dachary <>
(cherry picked from commit 4e1405e7720eda71a872c991045ac8ead6f3e7d8)

History

#1 Updated by Loïc Dachary almost 10 years ago

  • Description updated (diff)
  • Status changed from 12 to Fix Under Review

#2 Updated by Loïc Dachary almost 10 years ago

  • Description updated (diff)

#3 Updated by Loïc Dachary almost 10 years ago

  • % Done changed from 0 to 50

#4 Updated by Sage Weil almost 10 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to firefly

#5 Updated by Loïc Dachary almost 10 years ago

  • Status changed from Pending Backport to Fix Under Review
  • % Done changed from 50 to 80

#6 Updated by Loïc Dachary almost 10 years ago

  • Target version set to 0.82

#7 Updated by Loïc Dachary over 9 years ago

  • Target version changed from 0.82 to 0.83 cont.

#8 Updated by Sage Weil over 9 years ago

  • Priority changed from Normal to Urgent

#10 Updated by Loïc Dachary over 9 years ago

  • Status changed from Fix Under Review to Pending Backport

#12 Updated by Greg Farnum over 9 years ago

Apparently having an EC pool is still sufficient to prevent kernel clients from mounting, so I don't think we can backport this fix until that problem has been resolved.

#13 Updated by Loïc Dachary over 9 years ago

<gregsfortytwo1> loicd: we've got an issue because (as a feature!), the presence of an EC rule in the osdmap will propagate out as a required feature bit for connections
<gregsfortytwo1> but kernel clients don't support it, so we're accidentally back to the place we were at where any cluster with an EC rule/pool can't be mounted by kernel clients
<loicd> gregsfortytwo: yes, I caught a glimpse of this discussion.
<gregsfortytwo1> which is…bad?
<gregsfortytwo1> so we don't want to automatically create the ec rules on upgrade until this is resolved
<gregsfortytwo1> or unwary upgraders will suddenly find they can't mount their RBD images :(
<loicd> I'm confused
<gregsfortytwo1> we have a bug right now which is preventing us from automatically creating EC rules, right? this is the one you're looking at
<loicd> I don't remember enough of this patch to be 100% sure but I think it only deals with erasure code profiles
<gregsfortytwo1> ah
<loicd> let me check once more ;-)
<gregsfortytwo1> I was under the impression it was about automatically creating the EC CRUSH rule
-*- loicd browsing https://github.com/ceph/ceph/pull/1990/files
<gregsfortytwo1> perhaps I am entirely mistaken
<gregsfortytwo1> (I mean, I thought that a profile included a crush rule)
<gregsfortytwo1> (that otherwise was not being created)
<loicd> gregsfortytwo: the profile is separate from the crush rule. The patch only creates it if it is missing and only when it is required by a command to create an erasure coded pool or ruleset.
<gregsfortytwo1> okay
<gregsfortytwo1> should probably just ignore me, then :)
<loicd> ok :-) Reading https://github.com/ceph/ceph/pull/1990/files#diff-0a5db46a44ae9900e226289a810f10e8R4367 it comes back to me ;-) The patch only addresses the absence of the default profile after an upgrade from emperor to firefly. It does not create anything spontaneously.
<gregsfortytwo1> so the creation of the profile doesn't create the corresponding crush rule?
<gregsfortytwo1> when is the rule created?
<loicd> profile management has its own set of commands https://github.com/ceph/ceph/blob/master/src/mon/MonCommands.h#L477 and its map https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.h#L136
<loicd> the ruleset is created either explicitly or implicitly when an erasure coded pool is created https://github.com/ceph/ceph/blob/master/src/mon/OSDMonitor.cc#L3261 and https://github.com/ceph/ceph/blob/master/src/mon/OSDMonitor.cc#L3036
<loicd> the explicit creation of the ruleset is with https://github.com/ceph/ceph/blob/master/src/mon/MonCommands.h#L464 and implemented using the same function via https://github.com/ceph/ceph/blob/master/src/mon/OSDMonitor.cc#L4355
<loicd> gregsfortytwo: ^ does that clarify the relationship between erasure code profile and the erasure code ruleset ? 
<loicd> I should add that an erasure coded ruleset is created by providing the erasure code profile to the erasure code plugin. Because the erasure code plugin is ultimately trusted to create a sensible ruleset.
<gregsfortytwo1> okay, that helps
<gregsfortytwo1> thanks!

#14 Updated by Loïc Dachary over 9 years ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 80 to 100

Also available in: Atom PDF