Project

General

Profile

Actions

Bug #11495

closed

"RuntimeError: ceph-deploy: new command failed" in ceph-deploy:fs-hammer-distro-basic-vps run

Added by Yuri Weinstein almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/teuthology-2015-04-28_10:09:45-ceph-deploy:fs-hammer-distro-basic-vps/
Jobs: all
Logs for one: http://qa-proxy.ceph.com/teuthology/teuthology-2015-04-28_10:09:45-ceph-deploy:fs-hammer-distro-basic-vps/866894/

2015-04-28T10:36:32.199 INFO:tasks.ceph_deploy:Building ceph cluster using ceph-deploy...
2015-04-28T10:36:32.199 INFO:teuthology.orchestra.run.vpm076:Running: 'cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy new vpm076.front.sepia.ceph.com vpm185.front.sepia.ceph.com vpm157.front.sepia.ceph.com'
2015-04-28T10:36:32.206 INFO:teuthology.orchestra.run.vpm076.stderr:bash: line 0: cd: /home/ubuntu/cephtest/ceph-deploy: No such file or directory
2015-04-28T10:36:32.206 INFO:tasks.ceph_deploy:Error encountered, logging exception before tearing down ceph-deploy
2015-04-28T10:36:32.207 INFO:tasks.ceph_deploy:Traceback (most recent call last):
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph_deploy.py", line 217, in build_ceph_cluster
    raise RuntimeError("ceph-deploy: new command failed")
RuntimeError: ceph-deploy: new command failed

2015-04-28T10:36:32.207 INFO:tasks.ceph_deploy:Stopping ceph...
2015-04-28T10:36:32.207 INFO:teuthology.orchestra.run.vpm076:Running: 'sudo stop ceph-all || sudo service ceph stop'
2015-04-28T10:36:32.306 INFO:teuthology.orchestra.run.vpm076.stderr:stop: Unknown job: ceph-all
2015-04-28T10:36:32.315 INFO:teuthology.orchestra.run.vpm076.stderr:ceph: unrecognized service
2015-04-28T10:36:32.316 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/ceph_deploy.py", line 379, in build_ceph_cluster
    'sudo', 'service', 'ceph', 'stop' ])
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/cluster.py", line 64, in run
    return [remote.run(**kwargs) for remote in remotes]
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 156, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait
    label=self.label)
CommandFailedError: Command failed on vpm076 with status 1: 'sudo stop ceph-all || sudo service ceph stop'

Suspect the symlink isn't created anymore, is this intended?

Actions #1

Updated by Travis Rhoden almost 9 years ago

I see what is happening, but haven't looked at why yet.

ceph-deploy is being installed on a different node from the one trying to run it. If you look at the same log referenced in this bug, it's trying to run ceph-deploy on vpm076, but it was installed on vpm157!

2015-04-28T10:36:23.149 INFO:tasks.ceph_deploy:Downloading ceph-deploy...
2015-04-28T10:36:23.149 INFO:teuthology.orchestra.run.vpm157:Running: 'git clone -b master git://git.ceph.com/ceph-deploy.git /home/ubuntu/cephtest/ceph-deploy'
2015-04-28T10:36:23.650 INFO:teuthology.orchestra.run.vpm157.stdout:Cloning into '/home/ubuntu/cephtest/ceph-deploy'...
2015-04-28T10:36:23.930 INFO:teuthology.orchestra.run.vpm157:Running: 'cd /home/ubuntu/cephtest/ceph-deploy && ./bootstrap'
2015-04-28T10:36:24.595 INFO:teuthology.orchestra.run.vpm157.stdout:New python executable in virtualenv/bin/python
2015-04-28T10:36:25.458 INFO:teuthology.orchestra.run.vpm157.stdout:Installing distribute.............................................................................................................................................................................................done.
2015-04-28T10:36:25.793 INFO:teuthology.orchestra.run.vpm157.stdout:Installing pip...............done.
2015-04-28T10:36:32.029 INFO:teuthology.orchestra.run.vpm157.stdout:[vendoring] Running command: git clone git://ceph.com/remoto
2015-04-28T10:36:32.029 INFO:teuthology.orchestra.run.vpm157.stdout:[vendoring] Running command: git checkout 0.0.25
2015-04-28T10:36:32.030 INFO:teuthology.orchestra.run.vpm157.stdout:[vendoring] Running command: python vendor.py
2015-04-28T10:36:32.030 INFO:teuthology.orchestra.run.vpm157.stdout:[vendoring] Running command: mv /home/ubuntu/cephtest/ceph-deploy/remoto/remoto /home/ubuntu/cephtest/ceph-deploy/ceph_deploy/lib/vendor/remoto
2015-04-28T10:36:32.030 INFO:teuthology.orchestra.run.vpm157.stdout:running develop
2015-04-28T10:36:32.045 INFO:teuthology.orchestra.run.vpm157.stdout:running egg_info
2015-04-28T10:36:32.045 INFO:teuthology.orchestra.run.vpm157.stdout:creating ceph_deploy.egg-info
2015-04-28T10:36:32.045 INFO:teuthology.orchestra.run.vpm157.stdout:writing requirements to ceph_deploy.egg-info/requires.txt
2015-04-28T10:36:32.045 INFO:teuthology.orchestra.run.vpm157.stdout:writing ceph_deploy.egg-info/PKG-INFO
2015-04-28T10:36:32.046 INFO:teuthology.orchestra.run.vpm157.stdout:writing top-level names to ceph_deploy.egg-info/top_level.txt
2015-04-28T10:36:32.047 INFO:teuthology.orchestra.run.vpm157.stdout:writing dependency_links to ceph_deploy.egg-info/dependency_links.txt
2015-04-28T10:36:32.047 INFO:teuthology.orchestra.run.vpm157.stdout:writing entry points to ceph_deploy.egg-info/entry_points.txt
2015-04-28T10:36:32.047 INFO:teuthology.orchestra.run.vpm157.stdout:writing manifest file 'ceph_deploy.egg-info/SOURCES.txt'
2015-04-28T10:36:32.057 INFO:teuthology.orchestra.run.vpm157.stdout:reading manifest file 'ceph_deploy.egg-info/SOURCES.txt'
2015-04-28T10:36:32.058 INFO:teuthology.orchestra.run.vpm157.stdout:reading manifest template 'MANIFEST.in'
2015-04-28T10:36:32.063 INFO:teuthology.orchestra.run.vpm157.stdout:writing manifest file 'ceph_deploy.egg-info/SOURCES.txt'
2015-04-28T10:36:32.158 INFO:teuthology.orchestra.run.vpm157.stdout:running build_ext
2015-04-28T10:36:32.158 INFO:teuthology.orchestra.run.vpm157.stdout:Creating /home/ubuntu/cephtest/ceph-deploy/virtualenv/lib/python2.7/site-packages/ceph-deploy.egg-link (link to .)
2015-04-28T10:36:32.159 INFO:teuthology.orchestra.run.vpm157.stdout:Adding ceph-deploy 1.5.23 to easy-install.pth file
2015-04-28T10:36:32.160 INFO:teuthology.orchestra.run.vpm157.stdout:Installing ceph-deploy script to /home/ubuntu/cephtest/ceph-deploy/virtualenv/bin
2015-04-28T10:36:32.161 INFO:teuthology.orchestra.run.vpm157.stdout:
2015-04-28T10:36:32.161 INFO:teuthology.orchestra.run.vpm157.stdout:Installed /home/ubuntu/cephtest/ceph-deploy
2015-04-28T10:36:32.161 INFO:teuthology.orchestra.run.vpm157.stdout:Processing dependencies for ceph-deploy==1.5.23
2015-04-28T10:36:32.162 INFO:teuthology.orchestra.run.vpm157.stdout:Searching for distribute==0.6.24
2015-04-28T10:36:32.163 INFO:teuthology.orchestra.run.vpm157.stdout:Best match: distribute 0.6.24
2015-04-28T10:36:32.163 INFO:teuthology.orchestra.run.vpm157.stdout:Processing distribute-0.6.24-py2.7.egg
2015-04-28T10:36:32.163 INFO:teuthology.orchestra.run.vpm157.stdout:distribute 0.6.24 is already the active version in easy-install.pth
2015-04-28T10:36:32.164 INFO:teuthology.orchestra.run.vpm157.stdout:Installing easy_install script to /home/ubuntu/cephtest/ceph-deploy/virtualenv/bin
2015-04-28T10:36:32.164 INFO:teuthology.orchestra.run.vpm157.stdout:Installing easy_install-2.7 script to /home/ubuntu/cephtest/ceph-deploy/virtualenv/bin
2015-04-28T10:36:32.165 INFO:teuthology.orchestra.run.vpm157.stdout:
2015-04-28T10:36:32.165 INFO:teuthology.orchestra.run.vpm157.stdout:Using /home/ubuntu/cephtest/ceph-deploy/virtualenv/lib/python2.7/site-packages/distribute-0.6.24-py2.7.egg
2015-04-28T10:36:32.165 INFO:teuthology.orchestra.run.vpm157.stdout:Finished processing dependencies for ceph-deploy==1.5.23
2015-04-28T10:36:32.199 INFO:tasks.ceph_deploy:Building ceph cluster using ceph-deploy...]
2015-04-28T10:36:32.199 INFO:teuthology.orchestra.run.vpm076:Running: 'cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy new vpm076.front.sepia.ceph.com vpm185.front.sepia.ceph.com vpm157.front.sepia.ceph.com'
2015-04-28T10:36:32.206 INFO:teuthology.orchestra.run.vpm076.stderr:bash: line 0: cd: /home/ubuntu/cephtest/ceph-deploy: No such file or directory

I'll have to see if I can figure out why.

Actions #2

Updated by Travis Rhoden almost 9 years ago

  • Assignee changed from Travis Rhoden to John Spray

It looks to me like this came in with https://github.com/ceph/ceph-qa-suite/commit/15b33337e803853573e906a5131307ecb9c7fc35

In this change, the function download_ceph_deploy was changed (https://github.com/ceph/ceph-qa-suite/commit/15b33337e803853573e906a5131307ecb9c7fc35#diff-f317400d67b89653d46d32a3601aad3cL29) to use ctx.cluster.only() when figuring out the first mon (and storing it as ceph_admin, aka the node to run ceph-deploy from).

It appears to me that this is returning a different node than the code here: https://github.com/ceph/ceph-qa-suite/commit/15b33337e803853573e906a5131307ecb9c7fc35#diff-f317400d67b89653d46d32a3601aad3cR166

I actually don't understand the purpose of https://github.com/ceph/ceph-qa-suite/commit/15b33337e803853573e906a5131307ecb9c7fc35#diff-f317400d67b89653d46d32a3601aad3cR168 at all. perhaps line 168 can be removed, and line 166 can be changed to:

ceph_admin = ctx.cluster.only(teuthology.get_first_mon(ctx, config))

And then use ceph_admin.run() instead.

John, what do you think?

If that's the fix, this needs to happen on both master and hammer branches.

Actions #3

Updated by John Spray almost 9 years ago

Yes, the key thing here is that the ceph_deploy task needs to pick a node and stick with it. I had made that change in download_ceph_deploy, because previously it would fail on teardown as the mon ID it initially picked no longer existed at time of teardown, but I guess I was testing with just one mon, and this also needs updating everywhere else so that it doesn't change its mind about which mon to use half way through.

Here's an untested patch: https://github.com/ceph/ceph-qa-suite/pull/423

Actions #4

Updated by John Spray almost 9 years ago

  • Status changed from New to Fix Under Review
  • Assignee deleted (John Spray)
Actions #5

Updated by Yuri Weinstein almost 9 years ago

  • Status changed from Fix Under Review to Pending Backport

I merged the PR meanwhile and will test with my changes to the suite.
We will need that backported to hammer and giant

Actions #6

Updated by Yuri Weinstein almost 9 years ago

John, here are runs :
http://pulpito.ceph.com/teuthology-2015-04-30_13:42:32-ceph-deploy-hammer-distro-basic-vps/
and rerun of failed only
http://pulpito.ceph.com/teuthology-2015-04-30_14:40:15-ceph-deploy-hammer-distro-basic-vps/
(failures seem env related)

Unless you see anything else left from your end, I think it looks good !
I applied all ceph-deply suite changes to master, hammer and next and will do giant too.
Can you backport all your changes so we are all in sync. Then we can test on diff branches and clean up whatever is left.

Actions #7

Updated by Yuri Weinstein almost 9 years ago

  • Assignee set to John Spray
Actions #8

Updated by Yuri Weinstein almost 9 years ago

Sage agreed not to backport to giant, John!

Actions #9

Updated by John Spray almost 9 years ago

  • Status changed from Pending Backport to Fix Under Review
  • Assignee changed from John Spray to Yuri Weinstein

Cool. Here's the hammer backport: https://github.com/ceph/ceph-qa-suite/pull/430

Actions #10

Updated by Yuri Weinstein almost 9 years ago

John, can you backport to next too, pls?

Actions #11

Updated by Yuri Weinstein almost 9 years ago

  • Status changed from Fix Under Review to Resolved
  • Regression set to No
Actions

Also available in: Atom PDF