Project

General

Profile

Activity

From 02/09/2017 to 03/10/2017

03/10/2017

11:30 PM Support #19261 (Resolved): vpn access
nhm@espresso +YYZPT29wYzY5ooaRzabCQ 1ee041dd58b9ec6eb678c47632ece7cf6c24e23bcbac28a77a82af05ba6cc148 Mark Nelson
07:45 PM Bug #18089: Various official builds missing from CI/Shaman
How's this looking? David Galloway
05:37 PM Feature #14296 (Resolved): Create VPSHOST nagios hostgroup
Done! We'll get notifications when VPSHOSTs go down to our inbox instead of ceph-infra now. David Galloway
03:54 PM Bug #19102 (Resolved): mira102: Input/output error
I updated firmware and reimaged again and disk seems fine. Jobs have been passing. David Galloway

03/09/2017

05:29 PM Bug #14840: mira091 is not accessible
I moved the VPSes to mira005 but it MCE'd last night. I updated BIOS firmware and am running memtest now.
David Galloway

03/08/2017

01:05 AM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
https://github.com/ceph/ceph-cm-ansible/pull/310 Dan Mick
01:05 AM Bug #19126 (Resolved): "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansib...
Dan Mick

03/07/2017

09:24 PM Bug #19216 (Resolved): smithi068 smithi076 smithi079 won't nuke
Not sure what had these testnodes locked prior to you attempting to nuke them but they had old/deleted chacra repos. ... David Galloway
08:50 PM Bug #19216 (Resolved): smithi068 smithi076 smithi079 won't nuke
http://paste2.org/vY1sPgmJ Yuri Weinstein
07:30 PM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
Similar: https://bugzilla.redhat.com/show_bug.cgi?id=784184 David Galloway
09:35 AM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
still see similar errors on smithi{014,038,134}
One example http://pulpito.ceph.com/teuthology-2017-03-06_03:25:01...
Zheng Yan
05:33 PM Bug #19078: Downburst failing to complete sometimes
Zack Cerza wrote:
> teuthology will now unlock if downburst's destroy failure is because the instance doesn't exist
...
David Galloway
05:14 PM Bug #19078: Downburst failing to complete sometimes
teuthology will now unlock if downburst's destroy failure is because the instance doesn't exist
https://github.com...
Zack Cerza

03/06/2017

11:18 PM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
Looking at smithi029,
abrt's module files were last modified on 3/4 at 10:33....
David Galloway
06:58 PM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
Starting to wonder if maybe the latest version of the selinux-policy-targeted packages are causing this.
Here's a ...
David Galloway
06:13 PM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
Reinstalling selinux-policy-targeted restores the module files (/etc/selinux/targeted/active/modules/100/*/*) to a sa... David Galloway
04:54 PM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
Problem reappeared except semodule fails on abrt (the first module in ... David Galloway
03:15 PM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
David Galloway

03/03/2017

09:27 PM Bug #19126 (Resolved): "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansib...
David Galloway
05:16 PM Bug #19078: Downburst failing to complete sometimes
So aside from the downburst hang, there's also a problem with unlocking a VPS if the VM doesn't exist.
vpm159 was ...
David Galloway

03/02/2017

07:00 PM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
All I was really able to deduce was that something was corrupting the mod_fastcgi SELinux policy module files in <cod... David Galloway
05:30 PM Bug #19126: "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansible to fail
Some notes.
I've got smithi150 (broken) and smithi143 (not broken) locked.
yum/rpm report that mod_fastcgi-2.4....
David Galloway

03/01/2017

11:09 PM Bug #18370: cluster [WRN] message from mon.2 was stamped 9.194390s in the future, clocks not sync...
okay, here is run /a/sage-2017-02-24_06:15:05-rados-wip-sage-testing---basic-smithi/855210... Sage Weil
10:49 PM Bug #18370: cluster [WRN] message from mon.2 was stamped 9.194390s in the future, clocks not sync...
Hrm, it was the fucking leap second.... Sage Weil
10:43 PM Bug #18370: cluster [WRN] message from mon.2 was stamped 9.194390s in the future, clocks not sync...
the clock goes backward in time by 1s, right at midnight:... Sage Weil
10:40 PM Bug #18089: Various official builds missing from CI/Shaman
8abf95af405e117298c5012aeaa4c60caf86a4fd (tip of "firefly")
- https://chacra.ceph.com/repos/ceph/firefly/8abf95af4...
David Galloway
03:51 PM Bug #18089: Various official builds missing from CI/Shaman
@David: OK, I started http://pad.ceph.com/p/upgrade-builds and will put the references/SHA1s there. Please update tha... Nathan Cutler
09:16 PM Bug #19126 (Resolved): "libsemanage.semanage_direct_get_module_info:" error causing ceph-cm-ansib...
The common role is occasionally failing to complete due to the following error:... David Galloway
12:05 AM Bug #19078: Downburst failing to complete sometimes
Okay, don't do anything with those machines for now.... David Galloway

02/28/2017

11:24 PM Bug #19078: Downburst failing to complete sometimes
Latest log:... Yuri Weinstein
10:45 PM Bug #19078: Downburst failing to complete sometimes
This is interesting...
From: http://qa-proxy.ceph.com/teuthology/teuthology-2017-02-28_19:25:04-upgrade:kraken-x-m...
David Galloway
10:27 PM Bug #19078: Downburst failing to complete sometimes
AIUI, there are a few options here:
# Log downburst output so we can get an idea of what's happening / why downbur...
David Galloway
10:04 PM Bug #19082: "Error destroying vpmXXX libvirt: QEMU Driver error : Domain not found"
Yuri Weinstein wrote:
> More vps in same state
>
> [...]
Any chance you have the nuke output from that time so...
David Galloway
09:30 PM Bug #18089: Various official builds missing from CI/Shaman
Okay, I can try again to push these old builds to the chacra node that won't delete builds. Can you provide the refs... David Galloway
08:52 PM Bug #18089: Various official builds missing from CI/Shaman
David Galloway wrote:
> Andrew Schoen wrote:
> > It's purposely out of the shaman rotation, we don't want dev build...
Andrew Schoen
08:09 PM Bug #18089: Various official builds missing from CI/Shaman
Andrew Schoen wrote:
> It's purposely out of the shaman rotation, we don't want dev builds being sent there. We just...
David Galloway
06:50 PM Bug #18089: Various official builds missing from CI/Shaman
David Galloway wrote:
> Andrew Schoen wrote:
> > David Galloway wrote:
> > > Nathan Cutler wrote:
> > > > @David,...
Andrew Schoen
06:45 PM Bug #18089: Various official builds missing from CI/Shaman
Andrew Schoen wrote:
> David Galloway wrote:
> > Nathan Cutler wrote:
> > > @David, it sounds like you could simpl...
David Galloway
05:48 PM Bug #18089: Various official builds missing from CI/Shaman
Nathan Cutler wrote:
> Thanks, David. We really need a way to protect individual builds from being deleted. See http...
Andrew Schoen
05:37 PM Bug #18089: Various official builds missing from CI/Shaman
David Galloway wrote:
> Nathan Cutler wrote:
> > @David, it sounds like you could simply put the legacy builds used...
Andrew Schoen
04:06 PM Bug #18089: Various official builds missing from CI/Shaman
Nathan Cutler wrote:
> @David, it sounds like you could simply put the legacy builds used in the upgrade tests on a ...
David Galloway
06:26 PM Bug #19102: mira102: Input/output error
This is after a reimage. SMART data shows the drive is okay and I can access sda1 in rescue mode. Very strange.
...
David Galloway
04:07 PM Bug #19102: mira102: Input/output error
I suspect this maybe specific to this node, marked down Yuri Weinstein

02/27/2017

11:20 PM Bug #19102 (Resolved): mira102: Input/output error
I see more examples like this during nuke, power cycle seems helping solve the issue.... Yuri Weinstein
11:17 PM Bug #19082 (New): "Error destroying vpmXXX libvirt: QEMU Driver error : Domain not found"
Yuri Weinstein
10:18 PM Bug #18089: Various official builds missing from CI/Shaman
@David, it sounds like you could simply put the legacy builds used in the upgrade tests on a Chacra host that is not ... Nathan Cutler

02/26/2017

04:14 PM Bug #19082: "Error destroying vpmXXX libvirt: QEMU Driver error : Domain not found"
More vps in same state... Yuri Weinstein
03:57 AM Bug #18089: Various official builds missing from CI/Shaman
How does Shaman/Chacra determine that a build is a "test build"? Builds specified in test YAML should not fall into t... Nathan Cutler
03:53 AM Bug #18089: Various official builds missing from CI/Shaman
Nathan Cutler
03:53 AM Bug #18089: Various official builds missing from CI/Shaman
Thanks, David. We really need a way to protect individual builds from being deleted. See http://tracker.ceph.com/issu... Nathan Cutler

02/25/2017

01:10 AM Bug #19078: Downburst failing to complete sometimes
downburst only gets the newer image if --forcenew is passed to create, which it isn't. Still mystified. Dan Mick
12:01 AM Bug #19078: Downburst failing to complete sometimes
downburst might need to download a new image from ubuntu.com to provision a vps. If that process hangs or fails, it ... Dan Mick

02/24/2017

10:57 PM Bug #19082 (Closed): "Error destroying vpmXXX libvirt: QEMU Driver error : Domain not found"
There was need to mark it down. You just needed to unlock it. David Galloway
10:44 PM Bug #19082 (Closed): "Error destroying vpmXXX libvirt: QEMU Driver error : Domain not found"
can't nuke, marked down... Yuri Weinstein
10:55 PM Bug #19078: Downburst failing to complete sometimes
Looks like vpm099 did come up.. or at least tried.
From mira017:/var/log/libvirt/qemu/vpm099.log...
David Galloway
04:58 PM Bug #19078: Downburst failing to complete sometimes
Looks like they never got created...?
http://qa-proxy.ceph.com/teuthology/yuriw-2017-02-23_22:44:01-upgrade:hammer...
David Galloway
04:51 PM Bug #19078 (Closed): Downburst failing to complete sometimes
scheduled_yuriw@teuthology
========
2017-02-24 16:49:22,111.111 INFO:teuthology.nuke:targets:
vpm069.front.sepia...
Yuri Weinstein

02/22/2017

09:04 PM Bug #18702 (Resolved): mira009 crashed and so all the jobs that tried to use mira009 failed
Reimaged host, updated firmware, and recreated VPSes. David Galloway
06:43 PM Bug #18702: mira009 crashed and so all the jobs that tried to use mira009 failed
This is a hardware issue and should've been filed in Sepia. David Galloway

02/21/2017

08:26 PM Bug #18089: Various official builds missing from CI/Shaman
Nathan Cutler wrote:
> > @Nathan, can you guys use ...
David Galloway
08:08 PM Bug #18089: Various official builds missing from CI/Shaman
> @Nathan, can you guys use ... Nathan Cutler
04:19 PM Bug #18089: Various official builds missing from CI/Shaman
@Nathan, can you guys use ... David Galloway
03:55 PM Bug #18089: Various official builds missing from CI/Shaman
Nathan Cutler wrote:
> @David, ISTR reading somewhere that Shaman keeps builds for only a few days or weeks. Are you...
David Galloway

02/20/2017

11:05 PM Bug #19021 (Resolved): LRC drive failure, osd.121/mira070
drive 5 on mira070, osd.121, has been found deaded. I took it out of the cluster to let it rebalance away.
A co...
Dan Mick
09:34 PM Bug #18089: Various official builds missing from CI/Shaman
@David, ISTR reading somewhere that Shaman keeps builds for only a few days or weeks. Are you safeguarding these buil... Nathan Cutler
11:32 AM Bug #18089: Various official builds missing from CI/Shaman
Still seeing this failure (tag: v0.67.10) which had been fixed before: http://pulpito.front.sepia.ceph.com/loic-2017-... Nathan Cutler

02/17/2017

10:39 PM Bug #18089: Various official builds missing from CI/Shaman
Yep that was my fault. I somehow managed to delete all the packages for that repo.
I re-pushed them and added the...
David Galloway
10:22 PM Bug #18357 (Resolved): smithi043 nvme
NVMe card was replaced David Galloway
10:22 PM Bug #18974 (Resolved): smithi130 ansible error
Reimaged host and it's passing jobs David Galloway
06:40 PM Bug #18974: smithi130 ansible error
Pass/fail ratio on this machine is 0:283 over the past 14 days (ouch). No other smithi testnodes come close to that ... David Galloway
06:38 PM Bug #18974 (In Progress): smithi130 ansible error
Found this in ansible log. Looks like bad repos are getting leftover from previous runs.... David Galloway
03:26 PM Bug #18974 (Resolved): smithi130 ansible error
while scanning a plain scalar in "/tmp/teuth_ansible_failures_qjgTAc", line 1, column 476 found unexpected ':' in "/t... Sage Weil

02/14/2017

08:46 PM Support #18934 (Rejected): Additional vCPUs for teuthology VM
Need to do this when we can afford to reboot the teuthology VM.... David Galloway

02/13/2017

10:38 PM Bug #18089: Various official builds missing from CI/Shaman
@David: Can you look at this test failure and determine if it's caused by this bug? http://pulpito.ceph.com/smithfarm... Nathan Cutler
 

Also available in: Atom