Project

General

Profile

Feature #7812

Implement CephControl for real clusters over SSH

Added by John Spray over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
Backend (services)
Target version:
% Done:

100%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:

Description

This is the interface used by integration tests to deal with a running ceph cluster. Currently the EmbeddedCephControl exists for talking to a simulated ceph cluster: implement the methods on ExternalCephControl to talk to a real cluster over SSH.

History

#1 Updated by Ian Colle over 7 years ago

  • Assignee set to Christina Meno

#2 Updated by Christina Meno over 7 years ago

I think this ticket is to:
flesh out the methods of ExternalCephControl specifically these methods:

def configure(self, server_count, cluster_count=1):
    # This can be a NOOP since we are doing it be hand for the first pass
def shutdown(self):
    # This can be a NOOP since we are doing it be hand for the first pass
def get_fsid(self):
def get_server_fqdns(self):
def mark_osd_in(self, fsid, osd_id, osd_in=True):
def go_dark(self, fsid, dark=True, minion_id=None):
def get_fqdns(self, fsid):
def get_service_fqdns(self, fsid, service_type):

These should all be implemented as ssh oneliners using the output .yaml from teuthology to guide me.

Finally I will ensure the tests run fixing any test that mistakenly depends on the minion-sim implementation.

#3 Updated by Christina Meno over 7 years ago

Looking at the API of EmbeddedCephControl I see a few issues that aren't quite clear:

configure() takes a cluster count param but only half of the interface seems to care which cluster it's operating on. e.g.

get_fsid(self): returns the attribute self.fsid which is set to the last cluster made in config()

get_server_fqdns(self): doesn't seem to distinguish what servers belong to what clusters. That seems wrong.

In our first pase ExternalCephControl.configure will only be able to deliver a cluster that is the shape we made it.

Does fixing the EmbeddedCephControl fall to a separate task?

#4 Updated by John Spray over 7 years ago

Gregory Meno wrote:

get_fsid(self): returns the attribute self.fsid which is set to the last cluster made in config()

self.fsid and get_fsid can go away, they aren't used (must be a hangover from before I made it multi-cluster)

get_server_fqdns(self): doesn't seem to distinguish what servers belong to what clusters. That seems wrong.

Remember that servers aren't uniquely associated with a cluster. A server can be involved to two clusters, or 0 clusters. Operations which are "server level" as opposed to "cluster level" can reasonably happen across servers without discriminating by which cluster(s) a server is involved in. That's the context in which get_server_fqdns is used: in the initially "Hook up all the servers" operation.

In our first pase ExternalCephControl.configure will only be able to deliver a cluster that is the shape we made it.

Yeah, I'm not massively surprised that that's a limitation of the ceph_deploy task. Suggest raising SkipTest from .configure() if someone asks for two, and add a ticket to the backlog for fixing up the ceph_deploy task to support multiple clusters.

Does fixing the EmbeddedCephControl fall to a separate task?

If you have to make any interface changes then keep it up to date, but I'm not particularly expecting any enhancements to it from this ticket.

#5 Updated by Christina Meno over 7 years ago

For go_dark() I plan to use iptables on the master to drop packets on the port used by the minions.

For configure() I plan to just scrape /usr/bin/ceph status

#6 Updated by John Spray over 7 years ago

Gregory Meno wrote:

For go_dark() I plan to use iptables on the master to drop packets on the port used by the minions.

Did you consider applying rules on the minion side instead of the master side? That way if running against a dev mode master we don't need do rootish linux stuff anywhere but the minions.

btw iptables stuff can get hairy, if this turns into a timesink we can fall back to killing salt minions I expect.

For configure() I plan to just scrape /usr/bin/ceph status

What's the thinking on where the salt bootstrap will happen in the process?

#7 Updated by Christina Meno over 7 years ago

Handing the rules/ killing the minions will be easier for sure. It would be more time consuming but hey N is small right now.

My current thinking is that bootstrap would happen after the first time we get a stable cluster ONCE in configure

#8 Updated by Christina Meno over 7 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 80

#9 Updated by Christina Meno over 7 years ago

  • Status changed from In Progress to Fix Under Review

#10 Updated by Christina Meno over 7 years ago

  • Status changed from Fix Under Review to Resolved
  • % Done changed from 80 to 100

Also available in: Atom PDF