Project

General

Profile

Bug #25129

Updated by Nathan Cutler over 5 years ago

When the "branch" option is given to the install task, a race condition can occur, as reproduced here: http://pulpito.ceph.com/smithfarm-2018-07-26_19:04:20-smithfarm-mimic-distro-basic-smithi/ 

 Somehow (not sure exactly by which mechanism) Teuthology adds the following override to every job: 

 <pre>    overrides: 
     install: 
       ceph: 
         sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e 
 </pre> 

 This is the sha1 of the branch given via the @--ceph@ parameter on the teuthology-suite command line. In this case, it's 

 When the tip of branch "mimic". 

 The test yaml further contains: does this: 

 <pre>    tasks: 
   - install: 
       branch: luminous 
 </pre> 

 When it starts, then, the install task has both "branch: luminous" "branch" and "sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e" (tip of mimic) "sha1" in its config dict, causing it to emit a message: 

 <pre>2018-07-26T19:20:51.219 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using branch</pre> 

 Next, teuthology queries shaman to get the repo: 

 <pre>2018-07-26T19:20:51.220 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=ubuntu%2F16.04%2Fx86_64&ref=luminous 
 </pre> 

 Shaman returns a list of repos containing builds from any SHA1 it finds in branch "luminous", and teuthology takes the first result - https://github.com/ceph/teuthology/blob/master/teuthology/packaging.py#L847 

 <pre>      def _get_base_url(self): 
         self.assert_result() 
 return self._result.json()[0]['url'] 
 </pre> 

 At this point, nothing has been installed yet. This marks the beginning of the race condition. 

 <pre>2018-07-26T19:20:51.811 INFO:teuthology.task.install.deb:Pulling from https://2.chacra.ceph.com/r/ceph/luminous/0ce17faf47b4165587f0e717e32d469dc8c3f285/ubuntu/xenial/flavors/default/ 
 2018-07-26T19:20:51.814 INFO:teuthology.task.install.deb:Package version is 12.2.7-18-g0ce17fa-1xenial 
 </pre> 

 Now the install task gets to work - calls @apt-get install@, lots of dependent packages get pulled in, etc. Time passes. 

 When the packages finish installing, the install task checks to make sure the expected package version was really installed. If, in the meantime, Shaman finishes building another SHA1 from branch "luminous", this check will fail: 

 <pre>2018-07-26T19:22:11.308 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using branch 
 2018-07-26T19:22:11.308 INFO:teuthology.packaging:ref: None 
 2018-07-26T19:22:11.308 INFO:teuthology.packaging:tag: None 
 2018-07-26T19:22:11.308 INFO:teuthology.packaging:branch: luminous 
 2018-07-26T19:22:11.308 INFO:teuthology.packaging:sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e 
 2018-07-26T19:22:11.309 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=ubuntu%2F16.04%2Fx86_64&ref=luminous 
 </pre> 

 This again returns a list of repos, but the first one (which is the only one teuthology looks at) is now different! (12.2.7-19-g3a01e5d-1xenial instead of 12.2.7-18-g0ce17fa-1xenial before apt-get was called) 

 But the log is silent about this anomaly. Next, the install task uses @dpkg-query@ to see which version of the ceph packages was actually installed: 

 <pre>2018-07-26T19:22:11.882 INFO:teuthology.orchestra.run.smithi092:Running: "dpkg-query -W -f '${Version}' ceph" 
 2018-07-26T19:22:11.915 INFO:teuthology.orchestra.run.smithi092.stdout:12.2.7-18-g0ce17fa-1xenial 
 2018-07-26T19:22:11.915 INFO:teuthology.packaging:The installed version of ceph is 12.2.7-18-g0ce17fa-1xenial 
 </pre> 

 Now the install task reports the anomaly and aborts: 

 <pre>2018-07-26T19:22:11.915 ERROR:teuthology.contextutil:Saw exception from nested tasks 
 Traceback (most recent call last): 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 30, in nested 
     vars.append(enter()) 
   File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ 
     return self.gen.next() 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 250, in install 
     install_packages(ctx, package_list, config) 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 127, in install_packages 
     verify_package_version(ctx, config, remote) 
   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 59, in verify_package_version 
     pkg=pkg_to_check 
 RuntimeError: ceph version 12.2.7-19-g3a01e5d-1xenial was not installed, found 12.2.7-18-g0ce17fa-1xenial. 
 </pre> 

 IMO in this case the install task should see that "branch: luminous" was given and "override the override", i.e. overwrite the sha1 from @--ceph@ with the real tip of branch "luminous".

Back