Project

General

Profile

Activity

From 07/03/2016 to 08/01/2016

08/01/2016

10:54 PM Bug #16826: SSH Error: data could not be sent to the remote host. Make sure this host can be reac...
https://github.com/ceph/ceph-cm-ansible/pull/275 Zack Cerza
06:29 PM Bug #16826: SSH Error: data could not be sent to the remote host. Make sure this host can be reac...
Potentially relevant upstream issues:
https://github.com/ansible/ansible/issues/13876
https://github.com/ansible/an...
Zack Cerza
06:28 PM Bug #16826: SSH Error: data could not be sent to the remote host. Make sure this host can be reac...
Attempts at getting ansible to give us more info:
https://github.com/ceph/teuthology/pull/919
Zack Cerza
05:24 PM Bug #16859 (Resolved): sepia access
Additional pubkey has been added and pushed to teuthology.front. Any test runs created after this point will automat... David Galloway
05:22 PM Feature #16858 (Need More Info): Requesting Lab Access
Hi Ryne,
You should have access to the Sepia lab now. Please verify you're able to connect to the vpn and ssh ryn...
David Galloway

07/31/2016

01:59 AM Bug #16875 (Closed): apama001 load spikes and OSD 65 dies
Output from dmesg when issue occurs... David Galloway

07/29/2016

08:19 PM Bug #16860: 'sudo yum install ceph-radosgw -y' fails jobs in rados run
See suspected https://github.com/ceph/ceph/pull/7338 Yuri Weinstein
07:55 PM Bug #16860 (Closed): 'sudo yum install ceph-radosgw -y' fails jobs in rados run
Run: http://pulpito.ceph.com/yuriw-2016-07-29_08:07:15-rados-wip-yuri-testing2_2016_7_29-distro-basic-smithi/
Jobs: ...
Yuri Weinstein
07:25 PM Bug #16859 (Resolved): sepia access
(Original request at http://tracker.ceph.com/issues/16031)
Please add (or replace, only if necessary please) this ...
Patrick Donnelly
05:03 PM Feature #16858 (Resolved): Requesting Lab Access
1. only requesting access to schedule jobs (and view their results)
2. username : ryneli@redhat
3. public SSH key:
...
zhenqiang li
03:21 AM Cleanup #16825: vpm061 has odd downburst error
Running a loop to visit all 25 vmhosts and refresh all defined pools.
Dan Mick
03:12 AM Cleanup #16825: vpm061 has odd downburst error
Yuri Weinstein wrote:
> See more in http://pulpito.ceph.com/teuthology-2016-07-26_08:18:30-smoke-master-distro-basic...
Dan Mick
02:40 AM Cleanup #16825: vpm061 has odd downburst error
The problem was a corrupted storage pool for vpm061: libvirt was reporting that it contained many images, but it only... Dan Mick

07/28/2016

11:41 PM Bug #16853 (Resolved): ssh-key change on teuthology host
My ssh key has changed on the teuthology host, i endup using ubuntu for locking/reserving nodes
Please add the bel...
Vasu Kulkarni
05:24 PM Bug #16826: SSH Error: data could not be sent to the remote host. Make sure this host can be reac...
Just to be clear, we're pretty sure ssh is not connecting/reconnecting at the time of the failures, so the above is j... Dan Mick
08:13 AM Support #16843 (Resolved): Request for sepia lab access
Hi,
I'm requesting access for scheduling jobs in the sepia lab.
Username: rdias
Public key:...
Ricardo Dias

07/27/2016

02:24 PM Bug #16826: SSH Error: data could not be sent to the remote host. Make sure this host can be reac...
Dan Mick wrote:
> It looks like the default ansible.cfg ssh connection timeout is 10s, which seems pretty short, esp...
David Galloway
02:11 AM Bug #16826: SSH Error: data could not be sent to the remote host. Make sure this host can be reac...
looks like ansible-playbook takes a -T/--timeout that might be an easy way to play with this Dan Mick
02:10 AM Bug #16826: SSH Error: data could not be sent to the remote host. Make sure this host can be reac...
It looks like the default ansible.cfg ssh connection timeout is 10s, which seems pretty short, especially if the orig... Dan Mick
02:03 AM Bug #16826: SSH Error: data could not be sent to the remote host. Make sure this host can be reac...
David Galloway wrote:
> * With the load so high, the ansible run prior to the actual teuthology test takes an unacce...
David Galloway
01:52 AM Bug #16826 (Resolved): SSH Error: data could not be sent to the remote host. Make sure this host ...
I've noticed an abnormally high number of jobs failing due to ssh failures during ceph-cm-ansible runs. I haven't be... David Galloway
02:28 AM Cleanup #16825: vpm061 has odd downburst error
See more in http://pulpito.ceph.com/teuthology-2016-07-26_08:18:30-smoke-master-distro-basic-vps/
334828, 334829, ...
Yuri Weinstein
01:54 AM Cleanup #16825: vpm061 has odd downburst error
I think I remember seeing this error during my mira drive party last week. It may be lab-wide and not just limited t... David Galloway
12:31 AM Cleanup #16825 (Resolved): vpm061 has odd downburst error
... Dan Mick

07/26/2016

06:48 PM Support #16713: Requesting lab access
Radoslaw,
You're all set. Please verify you can connect to the VPN and ssh rzarzynski@teuthology.front.sepia.ceph...
David Galloway
02:50 PM Bug #16816 (Resolved): teuthology-logs.public.ceph.com not reachable
The virtual machine was shutdown, it's rebooted and the service is back. Thanks for noticing ! Loïc Dachary
02:05 PM Bug #16816 (Resolved): teuthology-logs.public.ceph.com not reachable
teuthology-openstack has a feature called ... Nathan Cutler

07/25/2016

08:16 PM Bug #14840 (In Progress): mira091 is not accessible
This system's got at least 1 bad DIMM according to SEL. Will have lab team diagnose and replace.
I've marked down...
David Galloway
03:59 PM Bug #16810 (Resolved): re-image smithi044
this node seems to be acting up on every run, pls re-image Yuri Weinstein

07/22/2016

06:00 PM Support #16713 (In Progress): Requesting lab access
David Galloway
06:00 PM Tasks #15389 (Resolved): read error on mira055 (osd.74):sdf1
Drive replaced and new drive added to cluster David Galloway
05:50 PM Bug #14478 (Resolved): mira089 MCE, bad processor?
Machine passed last 3 jobs. Total job stats don't appear abnormal David Galloway
05:47 PM Bug #14546: mira033 kernel panic from MCE
Tested DIMMs and didn't find a bad one. If MCEs persist, will retire machine. David Galloway
05:45 PM Bug #16238 (Resolved): Input/output error on mira023
Machine reimaged. All drives present as healthy. David Galloway
05:43 PM Bug #16326 (Resolved): mira033, mira052 fried memory
Tested all 8 DIMMs individually and found 1 bad. Replaced with spare and reimaged hosts. David Galloway
05:42 PM Feature #16669 (Resolved): Sepia Status page
http://status.sepia.ceph.com/ David Galloway
05:40 PM Bug #16720 (Resolved): mira038 losing ssh connectivity after reboot
Reimaged and released David Galloway
05:08 PM Bug #15147: mira095 RAID6 degraded
Drive 5 failing David Galloway
03:50 PM Cleanup #14528: Track down usage and purpose of mira{123..126} aka dubia{001..004}
jenkins.front can be repurposed
7OCT2016 - Just shut it down.
David Galloway

07/21/2016

10:10 PM Cleanup #14528: Track down usage and purpose of mira{123..126} aka dubia{001..004}
Replaced drive 4 in jenkins. Its RAID is degraded and underlying data may be lost. I can't get smart data from the ... David Galloway
09:16 PM Feature #15563 (Resolved): reduce email noise from nightlies crontab entries
Yuri Weinstein
05:32 PM Bug #16765 (Resolved): can't nuke mira046
Stuck at :... Yuri Weinstein

07/19/2016

04:58 PM Bug #16728: Cannot find a valid baseurl for repo: base/7/x86_6
Here's another failure: http://qa-proxy.ceph.com/teuthology/teuthology-2016-07-18_18:10:02-upgrade:infernalis-inferna... David Galloway
12:10 AM Bug #16728: Cannot find a valid baseurl for repo: base/7/x86_6
Cannot find a valid baseurl for repo: base/7/x86_6 is the failure marker. Updating title to reflect. Dan Mick

07/18/2016

10:54 PM Bug #16728: Cannot find a valid baseurl for repo: base/7/x86_6
related: https://github.com/ceph/ceph-cm-ansible/pull/268 Zack Cerza
10:36 PM Bug #16728: Cannot find a valid baseurl for repo: base/7/x86_6
Actual issue is a yum failure. We're working on increasing loglevel for yum transactions so we can debug what's actu... David Galloway
10:30 PM Bug #16728 (Can't reproduce): Cannot find a valid baseurl for repo: base/7/x86_6
Six of the jobs in this run failed in this way, for example:
http://pulpito.ceph.com/teuthology-2016-07-13_02:10:02-...
John Spray
09:58 PM Bug #16724 (Resolved): ceph-ansible suite: failed jobs due to git clone failed error
The ceph-ansible repo wasn't mirrored on git.ceph.com. I've added it and it's now mirrored.
http://git.ceph.com/?...
David Galloway
09:46 PM Bug #16724 (Resolved): ceph-ansible suite: failed jobs due to git clone failed error
the ceph-ansible suite has failed jobs due to "git clone failed error"
pasting below the excerpt from teuthology.l...
Tamilarasi muthamizhan
08:45 PM Bug #16719 (Resolved): smith006.ipmi.sepia.ceph.com is in bad state
You had a typo in your ipmitool command.
I reimaged the host anyway since it was in a weird state.
David Galloway
05:21 PM Bug #16719 (Resolved): smith006.ipmi.sepia.ceph.com is in bad state
ipmitool -H smith006.ipmi.sepia.ceph.com -I XXXXXX power cycle
Address lookup for smith006.ipmi.sepia.ceph.com faile...
Yuri Weinstein
05:34 PM Bug #16720 (Resolved): mira038 losing ssh connectivity after reboot
marked it down
nuke/stale has this error:...
Yuri Weinstein
04:13 PM Support #16713 (Resolved): Requesting lab access
1. Access type: scheduling jobs.
2. Username:...
Radoslaw Zarzynski
03:38 PM Bug #16711 (Duplicate): sudo yum install -y kernel fails
http://pulpito.ceph.com/teuthology-2016-07-17_04:20:03-upgrade:jewel-x-master-distro-basic-vps/319488/... Loïc Dachary
02:55 PM Bug #15297 (Closed): kernel yum install task failed due to apparent dns failure on gitbuilder.cep...
David Galloway
07:00 AM Bug #15297: kernel yum install task failed due to apparent dns failure on gitbuilder.ceph.com
Loïc Dachary

07/12/2016

10:01 PM Feature #16669: Sepia Status page
One option I intend to install and test out: https://cachethq.io/ David Galloway
09:28 PM Feature #16669 (Resolved): Sepia Status page
gmeno mentioned it'd be nice to have a (non-nagios) Lab Status / uptime page to track lab, queue, suite, etc. statuse... David Galloway
03:11 PM Bug #11571 (Closed): "Bad hostname 'magnaXXX.front.sepia.ceph.com'" error in rgw-firefly-distro-b...
Whatever caused this is almost certainly fixed now. Closing since it's been over a year. David Galloway

07/07/2016

07:39 PM Bug #16615 (Resolved): Failed to download remote objects and refs: fatal: shallow file was change...
Zack Cerza
06:45 PM Bug #16615 (Fix Under Review): Failed to download remote objects and refs: fatal: shallow file wa...
https://github.com/ceph/ceph-cm-ansible/pull/264 Zack Cerza
06:42 PM Bug #16615 (In Progress): Failed to download remote objects and refs: fatal: shallow file was cha...
Zack Cerza
06:40 PM Bug #16615: Failed to download remote objects and refs: fatal: shallow file was changed during fetch
The only thing I can think of doing about this is not to use a shallow clone. Looks like it might not make much of a ... Zack Cerza
06:34 PM Bug #16615 (Resolved): Failed to download remote objects and refs: fatal: shallow file was change...
Jobs are sporadically failing due to $subject.
See http://pulpito.ceph.com/cbodley-2016-07-07_11:21:33-rgw-wip-rgw...
David Galloway
 

Also available in: Atom