Project

General

Profile

Actions

Tasks #6641

closed

Make teuthology run on an existing cluster

Added by Zack Cerza over 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Actions #1

Updated by Zack Cerza over 10 years ago

  • Target version set to v0.72 Emperor
Actions #2

Updated by Zack Cerza over 10 years ago

  • Assignee set to Zack Cerza
Actions #3

Updated by Tamilarasi muthamizhan over 10 years ago

  • Status changed from 12 to In Progress
Actions #4

Updated by Zack Cerza over 10 years ago

Traceback (most recent call last):
  File "/Users/zack/inkdev/teuthology/teuthology/run_tasks.py", line 31, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/Users/zack/inkdev/teuthology/teuthology/run_tasks.py", line 19, in run_one_task
    return fn(**kwargs)
  File "/Users/zack/inkdev/teuthology/teuthology/task/internal.py", line 229, in check_ceph_data
    raise RuntimeError('Stale /var/lib/ceph detected, aborting.')
RuntimeError: Stale /var/lib/ceph detected, aborting.
DEBUG:teuthology.run_tasks:Exception was not quenched, exiting: RuntimeError: Stale /var/lib/ceph detected, aborting.
INFO:teuthology.run:Summary data:
{failure_reason: 'Stale /var/lib/ceph detected, aborting.', owner: zack@zmba.local,
  success: false}

teuthology.run.main() unconditionally adds the 'internal.check_ceph_data' task to every job.
teuthology.task.internal.check_ceph_data() forces the test to bail if /var/lib/ceph/ exists.
It looks like this will require some (hopefully minimal) changes to teuthology after all.

Actions #5

Updated by Zack Cerza over 10 years ago

I'm thinking of adding a use_existing_cluster: true option (or similar) that can be used in job yaml files. That way we can ignore an existing /var/lib/ceph and potentially take (or not take) any other actions we need to when using an existing cluster.

Actions #7

Updated by Zack Cerza over 10 years ago

I had to manually install ceph-test to get the ceph-coverage binary installed. Pretty much everything teuthology does tries to use that.

Now I just need to find a test that can be run on an existing cluster.

Actions #8

Updated by Zack Cerza over 10 years ago

Another roadblock:
teuthology.task.ceph.task() is causing teuthology.task.ceph.cluster() (which creates the cluster) to be run. The ceph task is required because it is what copies scripts like adjust-ulimits to the nodes. I made this change to skip cluster():
https://github.com/ceph/teuthology/commit/d04f3a6ae09224c4bf2d03fb058807b5a6cbf666

But I am seeing this now:

INFO:teuthology.task.ceph:Starting mds daemons...
INFO:teuthology.task.ceph.mds.0:Restarting
INFO:teuthology.task.ceph.osd.1.out:[10.214.136.34]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
INFO:teuthology.task.ceph.osd.1.err:[10.214.136.34]: 2013-11-21 11:45:35.599476 7f9097f96780 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
INFO:teuthology.task.ceph.mds.0:Started
INFO:teuthology.task.ceph.mds.0.out:[10.214.136.32]: starting mds.0 at :/0
INFO:teuthology.task.ceph.mds.0.err:[10.214.136.32]: 2013-11-21 11:45:35.719808 7fe0aceb8780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted
INFO:teuthology.task.ceph.mds.0.err:[10.214.136.32]: *** Caught signal (Segmentation fault) **
INFO:teuthology.task.ceph.mds.0.err:[10.214.136.32]:  in thread 7fe0a5e44700
INFO:teuthology.orchestra.run.err:[10.214.136.32]: 2013-11-21 11:45:35.911517 7f70fe9e6700  0 librados: client.admin authentication error (1) Operation not permitted
INFO:teuthology.orchestra.run.err:[10.214.136.32]: Error connecting to cluster: PermissionError
ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/Users/zack/inkdev/teuthology/teuthology/contextutil.py", line 25, in nested
    vars.append(enter())
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/Users/zack/inkdev/teuthology/teuthology/task/ceph.py", line 1098, in run_daemon
    'mds', 'set_max_mds', str(num_active)])
  File "/Users/zack/inkdev/teuthology/teuthology/orchestra/remote.py", line 47, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/Users/zack/inkdev/teuthology/teuthology/orchestra/run.py", line 271, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/Users/zack/inkdev/teuthology/teuthology/orchestra/run.py", line 267, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.136.32 with status 1: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph mds set_max_mds 1'

Actions #9

Updated by Ian Colle over 10 years ago

  • Target version changed from v0.72 Emperor to v0.73
Actions #10

Updated by Ian Colle over 10 years ago

  • Target version changed from v0.73 to v0.74
Actions #11

Updated by Tamilarasi muthamizhan over 10 years ago

  • Target version changed from v0.74 to v0.75
Actions #12

Updated by Zack Cerza over 10 years ago

  • Target version changed from v0.75 to v0.76
Actions #13

Updated by Zack Cerza over 10 years ago

  • Target version deleted (v0.76)
Actions #14

Updated by Ian Colle about 10 years ago

  • Subject changed from Attempt to run teuthology on an existing cluster to Make teuthology run on an existing cluster
  • Target version set to sprint5
Actions #15

Updated by Zack Cerza almost 10 years ago

  • Status changed from In Progress to Closed

I'm going to close this because it was always too open-ended. I will continue to investigate and open bugs for issues I find.

Actions

Also available in: Atom PDF