Tasks #6641
closedMake teuthology run on an existing cluster
0%
Updated by Tamilarasi muthamizhan over 10 years ago
- Status changed from 12 to In Progress
Updated by Zack Cerza over 10 years ago
Traceback (most recent call last): File "/Users/zack/inkdev/teuthology/teuthology/run_tasks.py", line 31, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/Users/zack/inkdev/teuthology/teuthology/run_tasks.py", line 19, in run_one_task return fn(**kwargs) File "/Users/zack/inkdev/teuthology/teuthology/task/internal.py", line 229, in check_ceph_data raise RuntimeError('Stale /var/lib/ceph detected, aborting.') RuntimeError: Stale /var/lib/ceph detected, aborting. DEBUG:teuthology.run_tasks:Exception was not quenched, exiting: RuntimeError: Stale /var/lib/ceph detected, aborting. INFO:teuthology.run:Summary data: {failure_reason: 'Stale /var/lib/ceph detected, aborting.', owner: zack@zmba.local, success: false}
teuthology.run.main() unconditionally adds the 'internal.check_ceph_data' task to every job.
teuthology.task.internal.check_ceph_data() forces the test to bail if /var/lib/ceph/ exists.
It looks like this will require some (hopefully minimal) changes to teuthology after all.
Updated by Zack Cerza over 10 years ago
I'm thinking of adding a use_existing_cluster: true
option (or similar) that can be used in job yaml files. That way we can ignore an existing /var/lib/ceph and potentially take (or not take) any other actions we need to when using an existing cluster.
Updated by Zack Cerza over 10 years ago
This fixes the previously-mentioned problem:
https://github.com/ceph/teuthology/commit/f8150d44d0af383510ab7c7bccde07d78a6a3fef
Updated by Zack Cerza over 10 years ago
I had to manually install ceph-test to get the ceph-coverage binary installed. Pretty much everything teuthology does tries to use that.
Now I just need to find a test that can be run on an existing cluster.
Updated by Zack Cerza over 10 years ago
Another roadblock:
teuthology.task.ceph.task() is causing teuthology.task.ceph.cluster() (which creates the cluster) to be run. The ceph task is required because it is what copies scripts like adjust-ulimits to the nodes. I made this change to skip cluster():
https://github.com/ceph/teuthology/commit/d04f3a6ae09224c4bf2d03fb058807b5a6cbf666
But I am seeing this now:
INFO:teuthology.task.ceph:Starting mds daemons... INFO:teuthology.task.ceph.mds.0:Restarting INFO:teuthology.task.ceph.osd.1.out:[10.214.136.34]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal INFO:teuthology.task.ceph.osd.1.err:[10.214.136.34]: 2013-11-21 11:45:35.599476 7f9097f96780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway INFO:teuthology.task.ceph.mds.0:Started INFO:teuthology.task.ceph.mds.0.out:[10.214.136.32]: starting mds.0 at :/0 INFO:teuthology.task.ceph.mds.0.err:[10.214.136.32]: 2013-11-21 11:45:35.719808 7fe0aceb8780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted INFO:teuthology.task.ceph.mds.0.err:[10.214.136.32]: *** Caught signal (Segmentation fault) ** INFO:teuthology.task.ceph.mds.0.err:[10.214.136.32]: in thread 7fe0a5e44700 INFO:teuthology.orchestra.run.err:[10.214.136.32]: 2013-11-21 11:45:35.911517 7f70fe9e6700 0 librados: client.admin authentication error (1) Operation not permitted INFO:teuthology.orchestra.run.err:[10.214.136.32]: Error connecting to cluster: PermissionError ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/Users/zack/inkdev/teuthology/teuthology/contextutil.py", line 25, in nested vars.append(enter()) File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/Users/zack/inkdev/teuthology/teuthology/task/ceph.py", line 1098, in run_daemon 'mds', 'set_max_mds', str(num_active)]) File "/Users/zack/inkdev/teuthology/teuthology/orchestra/remote.py", line 47, in run r = self._runner(client=self.ssh, **kwargs) File "/Users/zack/inkdev/teuthology/teuthology/orchestra/run.py", line 271, in run r.exitstatus = _check_status(r.exitstatus) File "/Users/zack/inkdev/teuthology/teuthology/orchestra/run.py", line 267, in _check_status raise CommandFailedError(command=r.command, exitstatus=status, node=host) CommandFailedError: Command failed on 10.214.136.32 with status 1: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph mds set_max_mds 1'
Updated by Ian Colle over 10 years ago
- Target version changed from v0.72 Emperor to v0.73
Updated by Ian Colle over 10 years ago
- Target version changed from v0.73 to v0.74
Updated by Tamilarasi muthamizhan over 10 years ago
- Target version changed from v0.74 to v0.75
Updated by Zack Cerza over 10 years ago
- Target version changed from v0.75 to v0.76
Updated by Ian Colle about 10 years ago
- Subject changed from Attempt to run teuthology on an existing cluster to Make teuthology run on an existing cluster
- Target version set to sprint5
Updated by Zack Cerza almost 10 years ago
- Status changed from In Progress to Closed
I'm going to close this because it was always too open-ended. I will continue to investigate and open bugs for issues I find.