Bug #11495
closed"RuntimeError: ceph-deploy: new command failed" in ceph-deploy:fs-hammer-distro-basic-vps run
0%
Description
Run: http://pulpito.ceph.com/teuthology-2015-04-28_10:09:45-ceph-deploy:fs-hammer-distro-basic-vps/
Jobs: all
Logs for one: http://qa-proxy.ceph.com/teuthology/teuthology-2015-04-28_10:09:45-ceph-deploy:fs-hammer-distro-basic-vps/866894/
2015-04-28T10:36:32.199 INFO:tasks.ceph_deploy:Building ceph cluster using ceph-deploy... 2015-04-28T10:36:32.199 INFO:teuthology.orchestra.run.vpm076:Running: 'cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy new vpm076.front.sepia.ceph.com vpm185.front.sepia.ceph.com vpm157.front.sepia.ceph.com' 2015-04-28T10:36:32.206 INFO:teuthology.orchestra.run.vpm076.stderr:bash: line 0: cd: /home/ubuntu/cephtest/ceph-deploy: No such file or directory 2015-04-28T10:36:32.206 INFO:tasks.ceph_deploy:Error encountered, logging exception before tearing down ceph-deploy 2015-04-28T10:36:32.207 INFO:tasks.ceph_deploy:Traceback (most recent call last): File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph_deploy.py", line 217, in build_ceph_cluster raise RuntimeError("ceph-deploy: new command failed") RuntimeError: ceph-deploy: new command failed 2015-04-28T10:36:32.207 INFO:tasks.ceph_deploy:Stopping ceph... 2015-04-28T10:36:32.207 INFO:teuthology.orchestra.run.vpm076:Running: 'sudo stop ceph-all || sudo service ceph stop' 2015-04-28T10:36:32.306 INFO:teuthology.orchestra.run.vpm076.stderr:stop: Unknown job: ceph-all 2015-04-28T10:36:32.315 INFO:teuthology.orchestra.run.vpm076.stderr:ceph: unrecognized service 2015-04-28T10:36:32.316 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph_deploy.py", line 379, in build_ceph_cluster 'sudo', 'service', 'ceph', 'stop' ]) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/cluster.py", line 64, in run return [remote.run(**kwargs) for remote in remotes] File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 156, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run r.wait() File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait label=self.label) CommandFailedError: Command failed on vpm076 with status 1: 'sudo stop ceph-all || sudo service ceph stop'
Suspect the symlink isn't created anymore, is this intended?
Updated by Travis Rhoden about 9 years ago
I see what is happening, but haven't looked at why yet.
ceph-deploy is being installed on a different node from the one trying to run it. If you look at the same log referenced in this bug, it's trying to run ceph-deploy on vpm076, but it was installed on vpm157!
2015-04-28T10:36:23.149 INFO:tasks.ceph_deploy:Downloading ceph-deploy... 2015-04-28T10:36:23.149 INFO:teuthology.orchestra.run.vpm157:Running: 'git clone -b master git://git.ceph.com/ceph-deploy.git /home/ubuntu/cephtest/ceph-deploy' 2015-04-28T10:36:23.650 INFO:teuthology.orchestra.run.vpm157.stdout:Cloning into '/home/ubuntu/cephtest/ceph-deploy'... 2015-04-28T10:36:23.930 INFO:teuthology.orchestra.run.vpm157:Running: 'cd /home/ubuntu/cephtest/ceph-deploy && ./bootstrap' 2015-04-28T10:36:24.595 INFO:teuthology.orchestra.run.vpm157.stdout:New python executable in virtualenv/bin/python 2015-04-28T10:36:25.458 INFO:teuthology.orchestra.run.vpm157.stdout:Installing distribute.............................................................................................................................................................................................done. 2015-04-28T10:36:25.793 INFO:teuthology.orchestra.run.vpm157.stdout:Installing pip...............done. 2015-04-28T10:36:32.029 INFO:teuthology.orchestra.run.vpm157.stdout:[vendoring] Running command: git clone git://ceph.com/remoto 2015-04-28T10:36:32.029 INFO:teuthology.orchestra.run.vpm157.stdout:[vendoring] Running command: git checkout 0.0.25 2015-04-28T10:36:32.030 INFO:teuthology.orchestra.run.vpm157.stdout:[vendoring] Running command: python vendor.py 2015-04-28T10:36:32.030 INFO:teuthology.orchestra.run.vpm157.stdout:[vendoring] Running command: mv /home/ubuntu/cephtest/ceph-deploy/remoto/remoto /home/ubuntu/cephtest/ceph-deploy/ceph_deploy/lib/vendor/remoto 2015-04-28T10:36:32.030 INFO:teuthology.orchestra.run.vpm157.stdout:running develop 2015-04-28T10:36:32.045 INFO:teuthology.orchestra.run.vpm157.stdout:running egg_info 2015-04-28T10:36:32.045 INFO:teuthology.orchestra.run.vpm157.stdout:creating ceph_deploy.egg-info 2015-04-28T10:36:32.045 INFO:teuthology.orchestra.run.vpm157.stdout:writing requirements to ceph_deploy.egg-info/requires.txt 2015-04-28T10:36:32.045 INFO:teuthology.orchestra.run.vpm157.stdout:writing ceph_deploy.egg-info/PKG-INFO 2015-04-28T10:36:32.046 INFO:teuthology.orchestra.run.vpm157.stdout:writing top-level names to ceph_deploy.egg-info/top_level.txt 2015-04-28T10:36:32.047 INFO:teuthology.orchestra.run.vpm157.stdout:writing dependency_links to ceph_deploy.egg-info/dependency_links.txt 2015-04-28T10:36:32.047 INFO:teuthology.orchestra.run.vpm157.stdout:writing entry points to ceph_deploy.egg-info/entry_points.txt 2015-04-28T10:36:32.047 INFO:teuthology.orchestra.run.vpm157.stdout:writing manifest file 'ceph_deploy.egg-info/SOURCES.txt' 2015-04-28T10:36:32.057 INFO:teuthology.orchestra.run.vpm157.stdout:reading manifest file 'ceph_deploy.egg-info/SOURCES.txt' 2015-04-28T10:36:32.058 INFO:teuthology.orchestra.run.vpm157.stdout:reading manifest template 'MANIFEST.in' 2015-04-28T10:36:32.063 INFO:teuthology.orchestra.run.vpm157.stdout:writing manifest file 'ceph_deploy.egg-info/SOURCES.txt' 2015-04-28T10:36:32.158 INFO:teuthology.orchestra.run.vpm157.stdout:running build_ext 2015-04-28T10:36:32.158 INFO:teuthology.orchestra.run.vpm157.stdout:Creating /home/ubuntu/cephtest/ceph-deploy/virtualenv/lib/python2.7/site-packages/ceph-deploy.egg-link (link to .) 2015-04-28T10:36:32.159 INFO:teuthology.orchestra.run.vpm157.stdout:Adding ceph-deploy 1.5.23 to easy-install.pth file 2015-04-28T10:36:32.160 INFO:teuthology.orchestra.run.vpm157.stdout:Installing ceph-deploy script to /home/ubuntu/cephtest/ceph-deploy/virtualenv/bin 2015-04-28T10:36:32.161 INFO:teuthology.orchestra.run.vpm157.stdout: 2015-04-28T10:36:32.161 INFO:teuthology.orchestra.run.vpm157.stdout:Installed /home/ubuntu/cephtest/ceph-deploy 2015-04-28T10:36:32.161 INFO:teuthology.orchestra.run.vpm157.stdout:Processing dependencies for ceph-deploy==1.5.23 2015-04-28T10:36:32.162 INFO:teuthology.orchestra.run.vpm157.stdout:Searching for distribute==0.6.24 2015-04-28T10:36:32.163 INFO:teuthology.orchestra.run.vpm157.stdout:Best match: distribute 0.6.24 2015-04-28T10:36:32.163 INFO:teuthology.orchestra.run.vpm157.stdout:Processing distribute-0.6.24-py2.7.egg 2015-04-28T10:36:32.163 INFO:teuthology.orchestra.run.vpm157.stdout:distribute 0.6.24 is already the active version in easy-install.pth 2015-04-28T10:36:32.164 INFO:teuthology.orchestra.run.vpm157.stdout:Installing easy_install script to /home/ubuntu/cephtest/ceph-deploy/virtualenv/bin 2015-04-28T10:36:32.164 INFO:teuthology.orchestra.run.vpm157.stdout:Installing easy_install-2.7 script to /home/ubuntu/cephtest/ceph-deploy/virtualenv/bin 2015-04-28T10:36:32.165 INFO:teuthology.orchestra.run.vpm157.stdout: 2015-04-28T10:36:32.165 INFO:teuthology.orchestra.run.vpm157.stdout:Using /home/ubuntu/cephtest/ceph-deploy/virtualenv/lib/python2.7/site-packages/distribute-0.6.24-py2.7.egg 2015-04-28T10:36:32.165 INFO:teuthology.orchestra.run.vpm157.stdout:Finished processing dependencies for ceph-deploy==1.5.23 2015-04-28T10:36:32.199 INFO:tasks.ceph_deploy:Building ceph cluster using ceph-deploy...] 2015-04-28T10:36:32.199 INFO:teuthology.orchestra.run.vpm076:Running: 'cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy new vpm076.front.sepia.ceph.com vpm185.front.sepia.ceph.com vpm157.front.sepia.ceph.com' 2015-04-28T10:36:32.206 INFO:teuthology.orchestra.run.vpm076.stderr:bash: line 0: cd: /home/ubuntu/cephtest/ceph-deploy: No such file or directory
I'll have to see if I can figure out why.
Updated by Travis Rhoden about 9 years ago
- Assignee changed from Travis Rhoden to John Spray
It looks to me like this came in with https://github.com/ceph/ceph-qa-suite/commit/15b33337e803853573e906a5131307ecb9c7fc35
In this change, the function download_ceph_deploy was changed (https://github.com/ceph/ceph-qa-suite/commit/15b33337e803853573e906a5131307ecb9c7fc35#diff-f317400d67b89653d46d32a3601aad3cL29) to use ctx.cluster.only() when figuring out the first mon (and storing it as ceph_admin, aka the node to run ceph-deploy from).
It appears to me that this is returning a different node than the code here: https://github.com/ceph/ceph-qa-suite/commit/15b33337e803853573e906a5131307ecb9c7fc35#diff-f317400d67b89653d46d32a3601aad3cR166
I actually don't understand the purpose of https://github.com/ceph/ceph-qa-suite/commit/15b33337e803853573e906a5131307ecb9c7fc35#diff-f317400d67b89653d46d32a3601aad3cR168 at all. perhaps line 168 can be removed, and line 166 can be changed to:
ceph_admin = ctx.cluster.only(teuthology.get_first_mon(ctx, config))
And then use ceph_admin.run() instead.
John, what do you think?
If that's the fix, this needs to happen on both master and hammer branches.
Updated by John Spray about 9 years ago
Yes, the key thing here is that the ceph_deploy task needs to pick a node and stick with it. I had made that change in download_ceph_deploy, because previously it would fail on teardown as the mon ID it initially picked no longer existed at time of teardown, but I guess I was testing with just one mon, and this also needs updating everywhere else so that it doesn't change its mind about which mon to use half way through.
Here's an untested patch: https://github.com/ceph/ceph-qa-suite/pull/423
Updated by John Spray about 9 years ago
- Status changed from New to Fix Under Review
- Assignee deleted (
John Spray)
Updated by Yuri Weinstein about 9 years ago
- Status changed from Fix Under Review to Pending Backport
I merged the PR meanwhile and will test with my changes to the suite.
We will need that backported to hammer and giant
Updated by Yuri Weinstein about 9 years ago
John, here are runs :
http://pulpito.ceph.com/teuthology-2015-04-30_13:42:32-ceph-deploy-hammer-distro-basic-vps/
and rerun of failed only
http://pulpito.ceph.com/teuthology-2015-04-30_14:40:15-ceph-deploy-hammer-distro-basic-vps/
(failures seem env related)
Unless you see anything else left from your end, I think it looks good !
I applied all ceph-deply suite changes to master, hammer and next and will do giant too.
Can you backport all your changes so we are all in sync. Then we can test on diff branches and clean up whatever is left.
Updated by Yuri Weinstein about 9 years ago
Sage agreed not to backport to giant, John!
Updated by John Spray about 9 years ago
- Status changed from Pending Backport to Fix Under Review
- Assignee changed from John Spray to Yuri Weinstein
Cool. Here's the hammer backport: https://github.com/ceph/ceph-qa-suite/pull/430
Updated by Yuri Weinstein about 9 years ago
John, can you backport to next too, pls?
Updated by Yuri Weinstein almost 9 years ago
- Status changed from Fix Under Review to Resolved
- Regression set to No