Fix #9435: prevent use of cache pools as metadata or data pools - CephFS - Ceph

Actions

Copy link

Fix #9435

closed

prevent use of cache pools as metadata or data pools

Added by Greg Farnum over 9 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

John Spray

Category:

Target version:

v0.87

% Done:

Source:

Development

Tags:

Backport:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

From the mailing list

I am testing the tiering functionality with cephfs. I used a replicated cache with an EC data pool, and a replicated metadata pool like this:

ceph osd pool create cache 1024 1024
ceph osd pool set cache size 2
ceph osd pool set cache min_size 1
ceph osd erasure-code-profile set profile11 k=8 m=3 ruleset-failure-domain=osd
ceph osd pool create ecdata 128 128 erasure profile11
ceph osd tier add ecdata cache
ceph osd tier cache-mode cache writeback
ceph osd tier set-overlay ecdata cache
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((280*1024*1024*1024))
ceph osd pool create metadata 128 128
ceph osd pool set metadata crush_ruleset 1 # SSD root in crushmap
ceph fs new ceph_fs metadata cache      <-- wrong ?

I started testing with this, and this worked, I could write to it with cephfs and the cache was flushing to the ecdata pool as expected.
But now I notice I made the fs right upon the cache, instead of the underlying data pool. I suppose I should have done this:

ceph fs new ceph_fs metadata ecdata

So my question is: Was this wrong and not doing the things I thought it did, or was this somehow handled by ceph and didn't it matter I specified the cache instead of the data pool?

This basically works, but gets awkward later on. I'm not sure what's involved in blocking the use of cache pools, and I'm not sure we want to fix it as an FS-specific thing rather than somehow doing it generically in the monitor command system. But we should do something.

Actions

Copy link

Updated by Joao Eduardo Luis over 9 years ago

would checking the nature of pools during 'fs new' on the monitor and failing if any of the specified pools (data or metadata) are cache pools work? Or did you have something else in mind?

Actions

Copy link

Updated by Joao Eduardo Luis over 9 years ago

I mean something like this (although I'm not positive I got all the requirements right):

diff --git a/src/mon/MDSMonitor.cc b/src/mon/MDSMonitor.cc
index fea198c..14fb003 100644
--- a/src/mon/MDSMonitor.cc
+++ b/src/mon/MDSMonitor.cc
@@ -985,6 +985,15 @@ bool MDSMonitor::management_command(
       return true;
     }

+    const pg_pool_t *p = mon->osdmon()->osdmap.get_pg_pool(metadata);
+    assert(p != NULL);
+    if (p->is_tier() && p->cache_mode != pg_pool_t::CACHEMODE_NONE) {
+      r = -EINVAL;
+      ss << "pool '" << metadata_name << "' cannot be a cache pool";
+      return true;
+    }
+
+

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Yeah, that's the simple solution. I was also wondering though if we wanted to do something more sophisticated trying to have the OSDMap or similar structures just transparently map from the cache pool to the base pools.

Actions

Copy link

Updated by Sage Weil over 9 years ago

My vote is NAK on this. THis is exactly what I want to do on my cluster and I this is the only way EC can be used for CephFS currently. Why do we want to prevent it?

Actions

Copy link

Updated by Greg Farnum over 9 years ago

This conversation is getting split across several mediums, but this shouldn't prevent specifying the use of a base pool which is EC with a cache pool that is replicated. Just the hooking of the filesystem's metadata to a pool which is sort of "ephemeral" like cache pools are.

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Status changed from New to 12
Assignee set to Greg Farnum
Target version set to v0.87

The user pointed out that right now we prevent assigning EC pools to CephFS. I believe this is the result of a user who set an EC pool as their data pool WITHOUT using a cache pool on top of it.
I'm not sure if the better user experience here is the nasty hack we're using now where we set the cache pool, or if we should only accept base pools but an EC base pool has to have a cache (this is what I'm leaning towards), or if there's some other option.

The issue with choice 2 is that the user could remove the cache pool from in front...which we could prevent in the OSDMonitor by having it check for use in the FS?
We do already have OSDMonitor::_check_remove_pool, so we could add similar logic when users run remove-overlay...maybe...

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Assignee deleted (~~Greg Farnum~~)

Actions

Copy link

Updated by Sage Weil over 9 years ago

I lean toward setting data pool to the base pool too. I worry about having to stand up so many guard rails, though. Hopefully we can consolidate the checks into "is_safe_to_remove" and "is_safe_to_remove_tier" type functions?

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Yes, that's what I'm hoping as well. That's what _check_remove_pool() is; we'd need to add an equivalent for tiering. And, maybe, have it check when changing the tier type as well... :/

Actions

Copy link

#10