Bug #19126
closed"libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
0%
Description
The common role is occasionally failing to complete due to the following error:
TASK [common : nrpe - Load SELinux policy package] ***************************** task path: /home/dgalloway/git/ceph/ceph-cm-ansible/roles/common/tasks/nrpe-selinux.yml:38 fatal: [smithi150.front.sepia.ceph.com]: FAILED! => { "changed": true, "cmd": [ "semodule", "-i", "/tmp/nrpe.pp" ], "delta": "0:00:00.085664", "end": "2017-03-01 21:14:07.733917", "failed": true, "invocation": { "module_args": { "_raw_params": "semodule -i /tmp/nrpe.pp", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true }, "module_name": "command" }, "rc": 1, "start": "2017-03-01 21:14:07.648253", "stderr": "libsemanage.semanage_direct_get_module_info: Unable to read mod_fastcgi module lang ext file.\nlibsemanage.semanage_direct_get_module_info: Unable to read mod_fastcgi module lang ext file.\nlibsemanage.semanage_direct_get_module_info: Unable to read mod_fastcgi module lang ext file.\nsemodule: Failed on /tmp/nrpe.pp!", "stdout": "", "stdout_lines": [], "warnings": [] }
The testnode gets in a perpetually broken state and fails all subsequent jobs when in this state.
Example: http://sentry.ceph.com/sepia/teuthology/issues/736/events/67513
Updated by David Galloway about 7 years ago
Some notes.
I've got smithi150 (broken) and smithi143 (not broken) locked.
yum/rpm report that mod_fastcgi-2.4.7-1.ceph.el7.centos.x86_64 is installed on both machines.
However, all the files in /etc/selinux/targeted/active/modules/400/mod_fastcgi/lang_ext
are empty on smithi150.
[root@smithi143 selinux]# file /etc/selinux/targeted/active/modules/400/mod_fastcgi/* /etc/selinux/targeted/active/modules/400/mod_fastcgi/cil: bzip2 compressed data, block size = 500k /etc/selinux/targeted/active/modules/400/mod_fastcgi/hll: bzip2 compressed data, block size = 500k /etc/selinux/targeted/active/modules/400/mod_fastcgi/lang_ext: ASCII text, with no line terminators [root@smithi150 ~]# file /etc/selinux/targeted/active/modules/400/mod_fastcgi/* /etc/selinux/targeted/active/modules/400/mod_fastcgi/cil: empty /etc/selinux/targeted/active/modules/400/mod_fastcgi/hll: empty /etc/selinux/targeted/active/modules/400/mod_fastcgi/lang_ext: empty
So something broke mod_fastcgi. I queried the last 50 jobs ran on smithi150 and it appears it was this job: http://qa-proxy.ceph.com/teuthology/teuthology-2017-02-27_02:01:17-rbd-master-distro-basic-smithi/862407/teuthology.log
Updated by David Galloway about 7 years ago
- Description updated (diff)
All I was really able to deduce was that something was corrupting the mod_fastcgi SELinux policy module files in /etc/selinux/targeted/active/modules/400/mod_fastcgi
.
We remove [1] and reinstall [2] mod_fastcgi and nrpe [3] modules with every ansible run anyway so I added a task to just make sure mod_fastcgi and nrpe are not present in /etc/selinux/targeted/active/modules/400
https://github.com/ceph/ceph-cm-ansible/pull/309
[1] https://github.com/ceph/ceph-cm-ansible/blob/master/roles/testnode/vars/yum_systems.yml#L21
[2] https://github.com/ceph/ceph-cm-ansible/blob/master/roles/testnode/vars/centos_7.yml#L78
[3] https://github.com/ceph/ceph-cm-ansible/blob/master/roles/common/tasks/nrpe-selinux.yml
Updated by David Galloway about 7 years ago
- Project changed from ceph-cm-ansible to sepia
- Subject changed from libsemanage.semanage_direct_get_module_info: Unable to read mod_fastcgi module lang ext file. to "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
- Category set to Test Node
- Status changed from Resolved to 12
Updated by David Galloway about 7 years ago
- Priority changed from Normal to Urgent
Problem reappeared except semodule fails on abrt (the first module in /etc/selinux/targeted/active/modules/100/
) now instead of mod_fastcgi.
Here are the last passed (or dead if ceph-cm-ansible passed) jobs on 5 of the smithi with this problem:
smithi014: http://pulpito.ceph.com/sage-2017-03-04_20:16:25-rados:thrash-erasure-code-wip-osd-full---basic-smithi/882764
smithi021: http://pulpito.ceph.com/sage-2017-03-03_22:21:26-rados-wip-sage-testing---basic-smithi/879924
smithi029: http://pulpito.ceph.com/sage-2017-03-03_02:56:43-rados-wip-sage-testing---basic-smithi/877300
smithi038: http://pulpito.ceph.com/teuthology-2017-03-04_05:20:02-kcephfs-kraken-testing-basic-smithi/882656/
smithi134: http://pulpito.ceph.com/sage-2017-03-03_22:21:26-rados-wip-sage-testing---basic-smithi/879934
First instance of this problem is January 23 2017 17:58:26 UTC: http://sentry.ceph.com/sepia/teuthology/issues/736/events/60205/
Updated by David Galloway about 7 years ago
Reinstalling selinux-policy-targeted restores the module files (/etc/selinux/targeted/active/modules/100/*/*) to a sane state and semodule
exits cleanly again.
I could just have selinux-policy-targeted reinstalled on every ansible run but something is causing them to get in a bad state and I'm trying to find out what.
smithi014 and smithi021 are no longer in a broken state because of my testing.
Updated by David Galloway about 7 years ago
Starting to wonder if maybe the latest version of the selinux-policy-targeted packages are causing this.
Here's a diff of install scripts between selinux-policy-targeted-3.13.1-102.el7_3.7.noarch and selinux-policy-targeted-3.13.1-102.el7_3.15.noarch
https://www.diffchecker.com/dxDGIUZP
According to http://mirror.centos.org/centos/7/updates/x86_64/Packages/, selinux-policy-targeted-3.13.1-102.el7_3.13 was released on 2017-01-18 which is close to the first time we saw this problem.
Difference between rebuild scripts: https://www.diffchecker.com/xqGJDnyW
Updated by David Galloway about 7 years ago
Looking at smithi029,
abrt's module files were last modified on 3/4 at 10:33.
[root@smithi029 abrt]# stat cil File: ‘cil’ Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 801h/2049d Inode: 6161979 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:semanage_store_t:s0 Access: 2017-03-06 19:19:53.034358186 +0000 Modify: 2017-03-04 10:33:12.224531450 +0000 Change: 2017-03-04 10:33:12.224531450 +0000
Looking at yum history
, I see transaction ID 46557 and 46558 ran at that time.
[root@smithi029 abrt]# yum history info 46558 Loaded plugins: fastestmirror, langpacks, priorities Transaction ID : 46558 Begin time : Sat Mar 4 10:33:03 2017 Begin rpmdb : 860:bd1cfe5bdc56c8e26dd990bad3d73de40da68d2d End time : 10:33:25 2017 (22 seconds) End rpmdb : 837:f1d1399e1a489c3a04b8f62cfc5dd37c8b3e505b User : <ubuntu> Return-Code : Success Command Line : -y remove librados2 Transaction performed with: Installed rpm-4.11.3-21.el7.x86_64 @anaconda Installed yum-3.4.3-150.el7.centos.noarch @anaconda Installed yum-plugin-fastestmirror-1.1.31-40.el7.noarch @anaconda Packages Altered: Erase ceph-base-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase ceph-common-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase ceph-mds-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase ceph-mgr-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase ceph-mon-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase ceph-osd-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase ceph-selinux-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase libcephfs-devel-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase libcephfs2-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase librados-devel-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase librados2-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase libradosstriper1-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase librbd1-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase librgw2-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase python-ceph-compat-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase python-cephfs-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase python-rados-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase python-rbd-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase python-rgw-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Erase qemu-img-10:1.5.3-126.el7_3.5.x86_64 @updates Erase qemu-kvm-10:1.5.3-126.el7_3.5.x86_64 @updates Erase qemu-kvm-common-10:1.5.3-126.el7_3.5.x86_64 @updates Erase rbd-fuse-1:12.0.0-931.g8d07615.el7.x86_64 @ceph Scriptlet output: 1 warning: file /etc/logrotate.d/ceph: remove failed: No such file or directory history info
That package version leads me to https://3.chacra.ceph.com/repos/ceph/wip-zyan-testing/8d0761524228ef05170ebadd57c65b92e5b66694/centos/7/flavors/default/
Installing ceph-selinux and removing it does not reproduce the issue.
Did the same exercise on smithi038. No joy on a reproducer.
Updated by Zheng Yan about 7 years ago
still see similar errors on smithi{014,038,134}
One example http://pulpito.ceph.com/teuthology-2017-03-06_03:25:01-kcephfs-master-testing-basic-smithi/886325/
I might lock these machines manually, ran following task, then ran teuthology -r -u -t.
roles: - [osd.0, mds.a, mds.b] - [osd.1, mds.c, mds.d] - [osd.2, mds.e, mds.f] - [osd.3, mds.g, mon.0] - [client.0] - [client.1] branch: wip-zyan-testing suite_branch: wip-zyan-testing suite_relpath: qa kernel: branch: testing overrides: install: ceph: branch: wip-zyan-testing ceph: conf: mds: mds thrash exports: 0 mds debug scatterstat: 0 debug monc: 20 tasks: - install: - ceph: - kclient: [client.0, client.1] - interactive:
maybe I did something wrong