Project

General

Profile

Bug #37335

QA run failures "Command failed on smithi with status 1: '\n sudo yum -y install ceph-radosgw\n ' "

Added by Laura Paduano over 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When running the QA tests for dashboard PRs we often get the above mentioned error.
More detailed error message from the logs:

2018-11-19T16:22:00.644 INFO:teuthology.orchestra.run.smithi047.stderr:Error: Package: 2:ceph-selinux-14.0.1-881.g09f2bb4.el7.x86_64 (ceph)
2018-11-19T16:22:00.644 INFO:teuthology.orchestra.run.smithi047.stderr:           Requires: selinux-policy-base >= 3.13.1-229.el7
2018-11-19T16:22:00.644 INFO:teuthology.orchestra.run.smithi047.stderr:           Installed: selinux-policy-targeted-3.13.1-192.el7_5.6.noarch (@updates)
2018-11-19T16:22:00.645 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.6
2018-11-19T16:22:00.645 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-minimum-3.13.1-192.el7.noarch (base)
2018-11-19T16:22:00.645 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7
2018-11-19T16:22:00.645 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-minimum-3.13.1-192.el7_5.3.noarch (updates)
2018-11-19T16:22:00.645 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.3
2018-11-19T16:22:00.645 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-minimum-3.13.1-192.el7_5.4.noarch (updates)
2018-11-19T16:22:00.645 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.4
2018-11-19T16:22:00.646 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-minimum-3.13.1-192.el7_5.6.noarch (updates)
2018-11-19T16:22:00.646 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.6
2018-11-19T16:22:00.646 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-mls-3.13.1-192.el7.noarch (base)
2018-11-19T16:22:00.646 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7
2018-11-19T16:22:00.646 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-mls-3.13.1-192.el7_5.3.noarch (updates)
2018-11-19T16:22:00.646 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.3
2018-11-19T16:22:00.646 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-mls-3.13.1-192.el7_5.4.noarch (updates)
2018-11-19T16:22:00.647 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.4
2018-11-19T16:22:00.647 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-mls-3.13.1-192.el7_5.6.noarch (updates)
2018-11-19T16:22:00.647 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.6
2018-11-19T16:22:00.647 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-targeted-3.13.1-192.el7.noarch (base)
2018-11-19T16:22:00.647 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7
2018-11-19T16:22:00.647 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-targeted-3.13.1-192.el7_5.3.noarch (updates)
2018-11-19T16:22:00.647 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.3
2018-11-19T16:22:00.647 INFO:teuthology.orchestra.run.smithi047.stderr:           Available: selinux-policy-targeted-3.13.1-192.el7_5.4.noarch (updates)
2018-11-19T16:22:00.648 INFO:teuthology.orchestra.run.smithi047.stderr:               selinux-policy-base = 3.13.1-192.el7_5.4
2018-11-19T16:22:00.648 INFO:teuthology.orchestra.run.smithi047.stdout: You could try using --skip-broken to work around the problem
2018-11-19T16:22:08.189 INFO:teuthology.orchestra.run.smithi187.stdout: You could try running: rpm -Va --nofiles --nodigest
2018-11-19T16:22:08.267 DEBUG:teuthology.orchestra.run:got remote process result: 1
2018-11-19T16:22:08.268 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 258, in install
    install_packages(ctx, package_list, config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 125, in install_packages
    ctx, remote, pkgs[system_type], config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/rpm.py", line 180, in _update_package_list_and_install
    cpack=cpack))
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 194, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 430, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi187 with status 1: '\n              sudo yum -y install ceph-radosgw\n            '
2018-11-19T16:22:08.269 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 89, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 627, in task
    lambda: ship_utilities(ctx=ctx, config=None),
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 258, in install
    install_packages(ctx, package_list, config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 125, in install_packages
    ctx, remote, pkgs[system_type], config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/rpm.py", line 180, in _update_package_list_and_install
    cpack=cpack))
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 194, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 430, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 162, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 184, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi187 with status 1: '\n              sudo yum -y install ceph-radosgw\n 

Also see:
http://pulpito.ceph.com/laura-2018-11-16_13:47:15-rados:mgr-wip-lpaduano-testing-24851-distro-basic-smithi/3261399/
http://pulpito.ceph.com/pnawracay-2018-11-20_07:28:07-rados:mgr-pna-fix-safe-to-destroy-distro-basic-smithi/3273594/
http://pulpito.ceph.com/tdehler-2018-11-19_15:27:38-rados:mgr-wip-tdehler-testing-25121-distro-basic-smithi/3270274/


Related issues

Related to Ceph - Bug #41603: "make check" failing in GitHub due to python packaging conflict Resolved
Copied to Ceph - Backport #41644: luminous: QA run failures "Command failed on smithi with status 1: '\n sudo yum -y install ceph-radosgw\n ' " Resolved
Copied to Ceph - Backport #41645: mimic: QA run failures "Command failed on smithi with status 1: '\n sudo yum -y install ceph-radosgw\n ' " Resolved

History

#1 Updated by Kefu Chai over 2 years ago

seems the centos build host is more updated, and was building ceph packages with "selinux-policy-base >= 3.13.1-229.el7", while the test node was still using selinux-policy-base = 3.13.1-192.el7. which is slightly older than 3.13.1-229.el7.

#2 Updated by Kefu Chai over 2 years ago

  • Project changed from mgr to sepia
  • Category deleted (testing)
  • Assignee set to Brad Hubbard

#3 Updated by Brad Hubbard over 2 years ago

It looks to me as though the build machine has the Continuous Release repo [1] enabled as that is the only place I can find the 3.13.1-229.el7 version packages (although there could be other explanations along similar lines). If that's the case then the test nodes need access to this repo as well. I think only the infrastructure guys can confirm this theory and do anything about it. I'll pursue this in teh morning my time when most of those guys should be around.

[1] https://wiki.centos.org/AdditionalResources/Repositories/CR

#4 Updated by David Galloway over 2 years ago

The version of selinux-policy-base required is based on what the build host had on it and not what's in a spec file somewhere? [1] OK.

I don't see in the slave-building ansible [1] where the CR repo gets added. How'd it get there? Do we expect Ceph users to also add it?

Adding the repo to the testnodes is relatively easy. I just feel like we've been adding random packages (python-jwt rings a bell [3]) and now repos that are adding additional barriers to entry for users.

[1] https://github.com/ceph/ceph-ci/blob/wip-lpaduano-testing-24851-2/ceph.spec.in#L914
[2] https://github.com/ceph/ceph-build/blob/master/ansible/slave.yml
[3] https://tracker.ceph.com/issues/36653

#5 Updated by Ken Dreyer over 2 years ago

I know of no reason to enable the CentOS CR repository on our build slaves. I recommend disabling it.

It would be great to build the RPMs with mock to isolate the packages from changes on the build slaves. (Like we do with pbuilder on Ubuntu.) This would make it easier to build for el7, el8, and Fedora on the same slaves.

#6 Updated by Brad Hubbard over 2 years ago

After talking to David, Ken, and Alfredo we believe this was caused by one of the build hosts being "dirty" at the time the build was done. The build log [1] shows "--> Already installed : selinux-policy-devel-3.13.1-229.el7.noarch" so it seems some '3.13.1-229' packages were already installed on the build host at that time. It was not my intention to suggest adding the CR repo to the test node as a permanent fix, but more as short-term solution. Obviously, the way to move forward is to work out how the newer packages are being introduced to the build host.

We may need to track these to try and establish a pattern.

Laura (and anyone else), can you post the build here when you see this again?

[1] https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/15730/consoleFull

#7 Updated by Brad Hubbard over 2 years ago

  • Status changed from New to Need More Info

#8 Updated by Brad Hubbard over 2 years ago

  • Status changed from Need More Info to 12

So, with a few tips from Kefu found the following.

$ ag "install-deps" ceph-dev-new-setup/build/build
42:if [ -x install-deps.sh ]; then
44:  ./install-deps.sh

So ceph-build's ceph-dev-new-setup step invokes install-deps.sh.

$ ag --nopager -B1 "enable cr" install-deps.sh
312-                if test $ID = centos -a $VERSION_ID = 7 ; then
313:                    $SUDO yum-config-manager --enable cr
--
329-                elif test $ID = virtuozzo -a $MAJOR_VERSION = 7 ; then
330:                    $SUDO yum-config-manager --enable cr

We are, therefore, enabling the CR repo on the build hosts for CentOS 7.

The reason we haven't noticed this before?

1: There hasn't been a newer version of the selinux packages in the CR repo.
2: Not every job runs on CentOS so it appears intermittent.

I'm building packages now to test this theory.

#9 Updated by Brad Hubbard over 2 years ago

So I think that one possible solution for this on the build nodes would be to do something like the following before any call to install-deps.sh.

# yum-config-manager --enable cr
# TMP=$(grep -m1 ^#baseurl /etc/yum.repos.d/CentOS-Base.repo|sed -e 's/#//'); sed -i -e "s@^baseurl=.*@$TMP@" /etc/yum.repos.d/CentOS-CR.repo

That would set the baseurl for the CR repo to the same as the Base repo rendering the 'yum-config-manager --enable cr' command in install-deps.sh a noop. Not particularly pretty but could accomplish what we want? Anyway, throwing it out there for comment.

#10 Updated by Alfredo Deza over 2 years ago

The commit that added that line mentions:

To get libunwind from the CR repositories until CentOS 7.2.1511 is released.

And references tracker issue: http://tracker.ceph.com/issues/13997

IMO we should remove that CR line since the condition doesn't seem to apply anymore, and test if we are still seeing that problem

#11 Updated by Brad Hubbard over 2 years ago

Haha, and not only that, I was even involved in getting that line put there! Talk about your fails :P Thanks Alfredo, I'll get that organised.

#12 Updated by Brad Hubbard over 2 years ago

  • Project changed from sepia to Ceph
  • Status changed from 12 to In Progress
  • Backport set to mimic, luminous

#13 Updated by Brad Hubbard over 2 years ago

  • Project changed from Ceph to sepia
  • Status changed from In Progress to Resolved

#14 Updated by Ken Dreyer over 2 years ago

I've begun setting up the pieces to build within mock. https://wiki.centos.org/SpecialInterestGroup/Storage/Ceph/Mock

#15 Updated by Brad Hubbard over 2 years ago

Good move Ken. Does CentOS have an equivalent of Fedora's copr?

#16 Updated by Ken Dreyer over 2 years ago

Great question Brad. I looked at Copr's settings, and I think it's possible.

When you set up a Copr project, in the web UI, if you de-select all the Chroots for the project, and then choose "custom-1-x86_64", and then insert all the Yum repositories from into the "External Repositories" section (https://github.com/CentOS-Storage-SIG/mock-ceph-config/blob/master/storage7-ceph-nautilus-el7-x86_64.cfg), it should work.

#17 Updated by Brad Hubbard over 2 years ago

Thanks Ken, I'll give this a try when I do my weekly fedora copr run.

#18 Updated by Nathan Cutler almost 2 years ago

  • Status changed from Resolved to Pending Backport

oops - this didn't get backported to mimic, causing #41603

#19 Updated by Nathan Cutler almost 2 years ago

  • Project changed from sepia to Ceph

#20 Updated by Nathan Cutler almost 2 years ago

  • Pull request ID set to 25211

#21 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #41644: luminous: QA run failures "Command failed on smithi with status 1: '\n sudo yum -y install ceph-radosgw\n ' " added

#22 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #41645: mimic: QA run failures "Command failed on smithi with status 1: '\n sudo yum -y install ceph-radosgw\n ' " added

#23 Updated by Nathan Cutler almost 2 years ago

  • Related to Bug #41603: "make check" failing in GitHub due to python packaging conflict added

#24 Updated by Brad Hubbard almost 2 years ago

Sorry about that. Strange we didn't notice this in mimic for 10+ months

#25 Updated by Nathan Cutler almost 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved".

#26 Updated by David Galloway almost 2 years ago

Just to log this somewhere... @dsavinea reached out to me because ceph-container and ceph-ansible jobs were failing due to this.

slave-centos01 and slave-centos05 still had selinux-policy 3.13.1-252.el7 installed. I downgraded the package to 3.13.1-229 on those two hosts and ran yum-config-manager --disable cr on all of the slave-centos* Jenkins slaves.

Also available in: Atom PDF