Fix #9435
prevent use of cache pools as metadata or data pools
0%
Description
From the mailing list
I am testing the tiering functionality with cephfs. I used a replicated cache with an EC data pool, and a replicated metadata pool like this: ceph osd pool create cache 1024 1024 ceph osd pool set cache size 2 ceph osd pool set cache min_size 1 ceph osd erasure-code-profile set profile11 k=8 m=3 ruleset-failure-domain=osd ceph osd pool create ecdata 128 128 erasure profile11 ceph osd tier add ecdata cache ceph osd tier cache-mode cache writeback ceph osd tier set-overlay ecdata cache ceph osd pool set cache hit_set_type bloom ceph osd pool set cache hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool set cache target_max_bytes $((280*1024*1024*1024)) ceph osd pool create metadata 128 128 ceph osd pool set metadata crush_ruleset 1 # SSD root in crushmap ceph fs new ceph_fs metadata cache <-- wrong ? I started testing with this, and this worked, I could write to it with cephfs and the cache was flushing to the ecdata pool as expected. But now I notice I made the fs right upon the cache, instead of the underlying data pool. I suppose I should have done this: ceph fs new ceph_fs metadata ecdata So my question is: Was this wrong and not doing the things I thought it did, or was this somehow handled by ceph and didn't it matter I specified the cache instead of the data pool?
This basically works, but gets awkward later on. I'm not sure what's involved in blocking the use of cache pools, and I'm not sure we want to fix it as an FS-specific thing rather than somehow doing it generically in the monitor command system. But we should do something.
Associated revisions
mon: prevent cache pools being used CephFS
Fixes two things: * EC pools are now permissible if they have a cache overlay * Pools are not permissible if they are a cache tier.
Fixes: #9435
Signed-off-by: John Spray <john.spray@redhat.com>
History
#1 Updated by Joao Eduardo Luis about 9 years ago
would checking the nature of pools during 'fs new' on the monitor and failing if any of the specified pools (data or metadata) are cache pools work? Or did you have something else in mind?
#2 Updated by Joao Eduardo Luis about 9 years ago
I mean something like this (although I'm not positive I got all the requirements right):
diff --git a/src/mon/MDSMonitor.cc b/src/mon/MDSMonitor.cc index fea198c..14fb003 100644 --- a/src/mon/MDSMonitor.cc +++ b/src/mon/MDSMonitor.cc @@ -985,6 +985,15 @@ bool MDSMonitor::management_command( return true; } + const pg_pool_t *p = mon->osdmon()->osdmap.get_pg_pool(metadata); + assert(p != NULL); + if (p->is_tier() && p->cache_mode != pg_pool_t::CACHEMODE_NONE) { + r = -EINVAL; + ss << "pool '" << metadata_name << "' cannot be a cache pool"; + return true; + } + +
#3 Updated by Greg Farnum about 9 years ago
Yeah, that's the simple solution. I was also wondering though if we wanted to do something more sophisticated trying to have the OSDMap or similar structures just transparently map from the cache pool to the base pools.
#4 Updated by Sage Weil about 9 years ago
My vote is NAK on this. THis is exactly what I want to do on my cluster and I this is the only way EC can be used for CephFS currently. Why do we want to prevent it?
#5 Updated by Greg Farnum about 9 years ago
This conversation is getting split across several mediums, but this shouldn't prevent specifying the use of a base pool which is EC with a cache pool that is replicated. Just the hooking of the filesystem's metadata to a pool which is sort of "ephemeral" like cache pools are.
#6 Updated by Greg Farnum about 9 years ago
- Status changed from New to 12
- Assignee set to Greg Farnum
- Target version set to v0.87
The user pointed out that right now we prevent assigning EC pools to CephFS. I believe this is the result of a user who set an EC pool as their data pool WITHOUT using a cache pool on top of it.
I'm not sure if the better user experience here is the nasty hack we're using now where we set the cache pool, or if we should only accept base pools but an EC base pool has to have a cache (this is what I'm leaning towards), or if there's some other option.
The issue with choice 2 is that the user could remove the cache pool from in front...which we could prevent in the OSDMonitor by having it check for use in the FS?
We do already have OSDMonitor::_check_remove_pool, so we could add similar logic when users run remove-overlay...maybe...
#7 Updated by Greg Farnum about 9 years ago
- Assignee deleted (
Greg Farnum)
#8 Updated by Sage Weil about 9 years ago
I lean toward setting data pool to the base pool too. I worry about having to stand up so many guard rails, though. Hopefully we can consolidate the checks into "is_safe_to_remove" and "is_safe_to_remove_tier" type functions?
#9 Updated by Greg Farnum about 9 years ago
Yes, that's what I'm hoping as well. That's what _check_remove_pool() is; we'd need to add an equivalent for tiering. And, maybe, have it check when changing the tier type as well... :/
#10 Updated by John Spray about 9 years ago
- Status changed from 12 to In Progress
- Assignee set to John Spray
#11 Updated by John Spray about 9 years ago
First half here: https://github.com/ceph/ceph/tree/wip-9435 (no handling of tiering updates yet)
#12 Updated by John Spray about 9 years ago
- Status changed from In Progress to Fix Under Review
#13 Updated by Greg Farnum about 9 years ago
- Status changed from Fix Under Review to Resolved
Merged into giant branch in commit:eb1b2e0072bf605095f4104c2b6c2abfba216dbe