Bug #25129
openRace condition in install task
0%
Description
When the "branch" option is given to the install task, a race condition can occur, as reproduced here: http://pulpito.ceph.com/smithfarm-2018-07-26_19:04:20-smithfarm-mimic-distro-basic-smithi/ (minimal reproducer) and here: http://pulpito.ceph.com/yuriw-2018-07-24_22:40:04-upgrade:luminous-x-mimic-distro-basic-smithi/ (real-life example)
Somehow (not sure exactly by which mechanism) Teuthology adds the following override to every job:
overrides: install: ceph: sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e
This is the sha1 of the branch given via the --ceph
parameter on the teuthology-suite command line. In this case, it's the tip of branch "mimic".
The test yaml further contains:
tasks: - install: branch: luminous
When it starts, then, the install task has both "branch: luminous" and "sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e" (tip of mimic) in its config dict, causing it to emit a message:
2018-07-26T19:20:51.219 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using branch
Next, teuthology queries shaman to get the repo:
2018-07-26T19:20:51.220 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=ubuntu%2F16.04%2Fx86_64&ref=luminous
Shaman returns a list of repos containing builds from any SHA1 it finds in branch "luminous", and teuthology takes the first result - https://github.com/ceph/teuthology/blob/master/teuthology/packaging.py#L847
def _get_base_url(self): self.assert_result() return self._result.json()[0]['url']
At this point, nothing has been installed yet. This marks the beginning of the race condition.
2018-07-26T19:20:51.811 INFO:teuthology.task.install.deb:Pulling from https://2.chacra.ceph.com/r/ceph/luminous/0ce17faf47b4165587f0e717e32d469dc8c3f285/ubuntu/xenial/flavors/default/ 2018-07-26T19:20:51.814 INFO:teuthology.task.install.deb:Package version is 12.2.7-18-g0ce17fa-1xenial
Now the install task gets to work - calls apt-get install
, lots of dependent packages get pulled in, etc. Time passes.
When the packages finish installing, the install task checks to make sure the expected package version was really installed. If, in the meantime, Shaman finishes building another SHA1 from branch "luminous", this check will fail:
2018-07-26T19:22:11.308 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using branch 2018-07-26T19:22:11.308 INFO:teuthology.packaging:ref: None 2018-07-26T19:22:11.308 INFO:teuthology.packaging:tag: None 2018-07-26T19:22:11.308 INFO:teuthology.packaging:branch: luminous 2018-07-26T19:22:11.308 INFO:teuthology.packaging:sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e 2018-07-26T19:22:11.309 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=ubuntu%2F16.04%2Fx86_64&ref=luminous
This again returns a list of repos, but the first one (which is the only one teuthology looks at) is now different! (12.2.7-19-g3a01e5d-1xenial instead of 12.2.7-18-g0ce17fa-1xenial before apt-get was called)
But the log is silent about this anomaly. Next, the install task uses dpkg-query
to see which version of the ceph packages was actually installed:
2018-07-26T19:22:11.882 INFO:teuthology.orchestra.run.smithi092:Running: "dpkg-query -W -f '${Version}' ceph" 2018-07-26T19:22:11.915 INFO:teuthology.orchestra.run.smithi092.stdout:12.2.7-18-g0ce17fa-1xenial 2018-07-26T19:22:11.915 INFO:teuthology.packaging:The installed version of ceph is 12.2.7-18-g0ce17fa-1xenial
Now the install task reports the anomaly and aborts:
2018-07-26T19:22:11.915 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 30, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 250, in install install_packages(ctx, package_list, config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 127, in install_packages verify_package_version(ctx, config, remote) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 59, in verify_package_version pkg=pkg_to_check RuntimeError: ceph version 12.2.7-19-g3a01e5d-1xenial was not installed, found 12.2.7-18-g0ce17fa-1xenial.
IMO in this case the install task should see that "branch: luminous" was given and "override the override", i.e. overwrite the sha1 from --ceph
with the real tip of branch "luminous".