Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2018-11-22T11:58:26ZCeph
Redmine teuthology - Bug #37370 (Fix Under Review): Test failures originating in run_tasks/_import are ne...https://tracker.ceph.com/issues/373702018-11-22T11:58:26ZNathan Cutlerncutler@suse.cz
<p>If someone (like me) writes a task that is buggy and cannot be imported, teuthology will currently fail like this:</p>
<pre>2018-11-22T11:00:20.674 INFO:teuthology.run_tasks:Running task foo...
2018-11-22T11:00:20.708 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/ubuntu/teuthology/teuthology/run_tasks.py", line 95, in run_tasks
run_one_task(taskname, stack, timer, ctx=ctx, config=config)
File "/home/ubuntu/teuthology/teuthology/run_tasks.py", line 70, in run_one_task
task = get_task(taskname)
File "/home/ubuntu/teuthology/teuthology/run_tasks.py", line 31, in get_task
raise ImportError("Could not find task '{}'".format(name))
ImportError: Could not find task 'foo'
</pre>
<p>This error message is misleading (or just downright wrong) because the task foo.py does exist in the right place. It just can't be imported due to some problem inside foo.py. But this problem is not reported because teuthology suppresses the "real" ImportError exception and replaces it with its own.</p>
<p>In this case, the "real" ImportError was</p>
<code>ImportError: cannot import name non_existent_function</code>
<p>but that one didn't appear in the log. Instead, we see "ImportError: Could not find task 'foo'", which makes no sense because qa/tasks/foo.py obviously exists.</p> teuthology - Bug #25129 (New): Race condition in install taskhttps://tracker.ceph.com/issues/251292018-07-26T20:34:43ZNathan Cutlerncutler@suse.cz
<p>When the "branch" option is given to the install task, a race condition can occur, as reproduced here: <a class="external" href="http://pulpito.ceph.com/smithfarm-2018-07-26_19:04:20-smithfarm-mimic-distro-basic-smithi/">http://pulpito.ceph.com/smithfarm-2018-07-26_19:04:20-smithfarm-mimic-distro-basic-smithi/</a> (minimal reproducer) and here: <a class="external" href="http://pulpito.ceph.com/yuriw-2018-07-24_22:40:04-upgrade:luminous-x-mimic-distro-basic-smithi/">http://pulpito.ceph.com/yuriw-2018-07-24_22:40:04-upgrade:luminous-x-mimic-distro-basic-smithi/</a> (real-life example)</p>
<p>Somehow (not sure exactly by which mechanism) Teuthology adds the following override to every job:</p>
<pre> overrides:
install:
ceph:
sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e
</pre>
<p>This is the sha1 of the branch given via the <code>--ceph</code> parameter on the teuthology-suite command line. In this case, it's the tip of branch "mimic".</p>
<p>The test yaml further contains:</p>
<pre> tasks:
- install:
branch: luminous
</pre>
<p>When it starts, then, the install task has both "branch: luminous" and "sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e" (tip of mimic) in its config dict, causing it to emit a message:</p>
<pre>2018-07-26T19:20:51.219 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using branch</pre>
<p>Next, teuthology queries shaman to get the repo:</p>
<pre>2018-07-26T19:20:51.220 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=ubuntu%2F16.04%2Fx86_64&ref=luminous
</pre>
<p>Shaman returns a list of repos containing builds from any SHA1 it finds in branch "luminous", and teuthology takes the first result - <a class="external" href="https://github.com/ceph/teuthology/blob/master/teuthology/packaging.py#L847">https://github.com/ceph/teuthology/blob/master/teuthology/packaging.py#L847</a></p>
<pre> def _get_base_url(self):
self.assert_result()
return self._result.json()[0]['url']
</pre>
<p>At this point, nothing has been installed yet. This marks the beginning of the race condition.</p>
<pre>2018-07-26T19:20:51.811 INFO:teuthology.task.install.deb:Pulling from https://2.chacra.ceph.com/r/ceph/luminous/0ce17faf47b4165587f0e717e32d469dc8c3f285/ubuntu/xenial/flavors/default/
2018-07-26T19:20:51.814 INFO:teuthology.task.install.deb:Package version is 12.2.7-18-g0ce17fa-1xenial
</pre>
<p>Now the install task gets to work - calls <code>apt-get install</code>, lots of dependent packages get pulled in, etc. Time passes.</p>
<p>When the packages finish installing, the install task checks to make sure the expected package version was really installed. If, in the meantime, Shaman finishes building another SHA1 from branch "luminous", this check will fail:</p>
<pre>2018-07-26T19:22:11.308 WARNING:teuthology.packaging:More than one of ref, tag, branch, or sha1 supplied; using branch
2018-07-26T19:22:11.308 INFO:teuthology.packaging:ref: None
2018-07-26T19:22:11.308 INFO:teuthology.packaging:tag: None
2018-07-26T19:22:11.308 INFO:teuthology.packaging:branch: luminous
2018-07-26T19:22:11.308 INFO:teuthology.packaging:sha1: dd471db8bce26d29051e8c41d2dbd8a2baf5186e
2018-07-26T19:22:11.309 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=ubuntu%2F16.04%2Fx86_64&ref=luminous
</pre>
<p>This again returns a list of repos, but the first one (which is the only one teuthology looks at) is now different! (12.2.7-19-g3a01e5d-1xenial instead of 12.2.7-18-g0ce17fa-1xenial before apt-get was called)</p>
<p>But the log is silent about this anomaly. Next, the install task uses <code>dpkg-query</code> to see which version of the ceph packages was actually installed:</p>
<pre>2018-07-26T19:22:11.882 INFO:teuthology.orchestra.run.smithi092:Running: "dpkg-query -W -f '${Version}' ceph"
2018-07-26T19:22:11.915 INFO:teuthology.orchestra.run.smithi092.stdout:12.2.7-18-g0ce17fa-1xenial
2018-07-26T19:22:11.915 INFO:teuthology.packaging:The installed version of ceph is 12.2.7-18-g0ce17fa-1xenial
</pre>
<p>Now the install task reports the anomaly and aborts:</p>
<pre>2018-07-26T19:22:11.915 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 30, in nested
vars.append(enter())
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 250, in install
install_packages(ctx, package_list, config)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 127, in install_packages
verify_package_version(ctx, config, remote)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 59, in verify_package_version
pkg=pkg_to_check
RuntimeError: ceph version 12.2.7-19-g3a01e5d-1xenial was not installed, found 12.2.7-18-g0ce17fa-1xenial.
</pre>
<p>IMO in this case the install task should see that "branch: luminous" was given and "override the override", i.e. overwrite the sha1 from <code>--ceph</code> with the real tip of branch "luminous".</p> teuthology - Bug #25020 (In Progress): install task: extra_packages not usable when a package is ...https://tracker.ceph.com/issues/250202018-07-20T13:59:42ZNathan Cutlerncutler@suse.cz
<p>For packages that have the same name in the Debian and RPM worlds, it is possible to do:</p>
<pre>tasks:
- install:
extra_packages: ['samba']
</pre>
<p>and this will cause <code>samba</code> to be added to the list of packages generated from <code>packages.yaml</code>, regardless of whether Debian or RPM is in use.</p>
<p>For packages that are named differently in Debian than in RPM, <code>extra_packages</code> in its current form is not usable.</p> teuthology - Bug #18818 (New): All jobs scheduled using --sha1 show up "branch: master", even if ...https://tracker.ceph.com/issues/188182017-02-04T10:22:20ZNathan Cutlerncutler@suse.cz
<p>When a user schedules a run using <code>--ceph $BRANCH</code>, the branch name shows up in the name of the run. Developers are used to parsing the name of the run to see which branch is being tested.</p>
<p>This works fine as long as the runs are scheduled in this way.</p>
<p>If a run is scheduled with <code>--sha1 $SHA1</code>, teuthology uses the default for the branch, which is master. Since master contains all valid SHA1s, it could be argued that this is reasonable. However, since the SHA1/tag does not show up in the run name, it can be confusing.</p>
<p>For example, if I want to test v10.2.4 for some reason, I find the SHA1 corresponding to that tag and do:</p>
<pre>
teuthology-suite --sha1 $SHA1_CORRESPONDING_TO_V10.2.4_TAG
</pre>
<p>From the name, the resulting run claims to be "master", but in reality it is jewel.</p>
<p>This could be fixed by having teuthology parse "git describe" to determine the major version and set the value of branch accordingly.</p> teuthology - Bug #17659 (New): OpenStack: volume created, but cannot immediately be added to serv...https://tracker.ceph.com/issues/176592016-10-21T11:02:06ZNathan Cutlerncutler@suse.cz
<p>Sometimes, the OpenStack provider's infrastructure cannot immediately add a newly-created volume to a server:</p>
<pre>
2016-10-21T10:30:14.413 DEBUG:teuthology.misc::sh: openstack --quiet volume show -f json target158069068026-0
2016-10-21T10:30:16.960 DEBUG:teuthology.misc:{
2016-10-21T10:30:16.960 DEBUG:teuthology.misc:"size": 10,
2016-10-21T10:30:16.961 DEBUG:teuthology.misc:"status": "available",
2016-10-21T10:30:16.961 DEBUG:teuthology.misc:"properties": "ownedby='158.69.95.136'",
2016-10-21T10:30:16.961 DEBUG:teuthology.misc:"user_id": "159a1258770041e49c4808fc92e01a26",
2016-10-21T10:30:16.961 DEBUG:teuthology.misc:"description": null,
2016-10-21T10:30:16.962 DEBUG:teuthology.misc:"availability_zone": "nova",
2016-10-21T10:30:16.962 DEBUG:teuthology.misc:"bootable": "false",
2016-10-21T10:30:16.962 DEBUG:teuthology.misc:"encrypted": false,
2016-10-21T10:30:16.962 DEBUG:teuthology.misc:"created_at": "2016-10-21T10:30:17.000000",
2016-10-21T10:30:16.962 DEBUG:teuthology.misc:"multiattach": false,
2016-10-21T10:30:16.963 DEBUG:teuthology.misc:"os-volume-replication:driver_data": null,
2016-10-21T10:30:16.963 DEBUG:teuthology.misc:"name": "target158069068026-0",
2016-10-21T10:30:16.963 DEBUG:teuthology.misc:"snapshot_id": null,
2016-10-21T10:30:16.963 DEBUG:teuthology.misc:"consistencygroup_id": null,
2016-10-21T10:30:16.963 DEBUG:teuthology.misc:"replication_status": "disabled",
2016-10-21T10:30:16.964 DEBUG:teuthology.misc:"os-vol-tenant-attr:tenant_id": "d966438c9cbb46c894789b10478ecf04",
2016-10-21T10:30:16.964 DEBUG:teuthology.misc:"source_volid": null,
2016-10-21T10:30:16.964 DEBUG:teuthology.misc:"os-volume-replication:extended_status": null,
2016-10-21T10:30:16.964 DEBUG:teuthology.misc:"type": "classic",
2016-10-21T10:30:16.964 DEBUG:teuthology.misc:"id": "7aebfe59-9aa4-40a3-8bf7-0a61f9a3dea1",
2016-10-21T10:30:16.964 DEBUG:teuthology.misc:"attachments": []
2016-10-21T10:30:16.999 DEBUG:teuthology.misc:}
2016-10-21T10:30:17.000 DEBUG:teuthology.misc::sh: openstack server add volume target158069068026 target158069068026-0
2016-10-21T10:30:25.343 DEBUG:teuthology.misc:Volume 7aebfe59-9aa4-40a3-8bf7-0a61f9a3dea1 could not be found. (HTTP 404) (Request-ID: req-7bffb1d3-94fd-40c5-bbac-6ddc009f574a)
2016-10-21T10:30:25.381 ERROR:teuthology.provision:Command 'openstack server add volume target158069068026 target158069068026-0' returned non-zero exit status 1
Traceback (most recent call last):
File "/home/ubuntu/teuthology/teuthology/provision.py", line 366, in create
self.attach_volumes(name, resources_hint['volumes'])
File "/home/ubuntu/teuthology/teuthology/provision.py", line 293, in attach_volumes
misc.sh("openstack server add volume " + name + " " + volume_name)
File "/home/ubuntu/teuthology/teuthology/misc.py", line 1371, in sh
output=output
CalledProcessError: Command 'openstack server add volume target158069068026 target158069068026-0' returned non-zero exit status 1
</pre>
<p>Fortunately this is rare so a workaround is simply to re-schedule the run.</p>
<p>However, since it's a transient error it should be handled as one - i.e. retry several times in a loop before giving up.</p>