Project

General

Profile

Bug #13876

qa: openstack MPI connection failures

Added by Greg Farnum over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Testing
Target version:
-
Start date:
11/25/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-11-12_18:00:02-fs-hammer---basic-openstack/14418/

2015-11-12T23:32:58.272 INFO:teuthology.orchestra.run.target088041:Running: 'mpiexec -f /home/ubuntu/cephtest/mpi-hosts /home/ubuntu/cephtest/mdtest-1.9.3/mdtest -d /home/ubuntu/cephtest/gmnt -I 20 -z 5 -b 2 -R'
2015-11-12T23:32:58.604 INFO:teuthology.orchestra.run.target088041.stderr:Warning: Permanently added '158.69.88.43' (ECDSA) to the list of known hosts.
2015-11-12T23:32:58.606 INFO:teuthology.orchestra.run.target088041.stderr:Warning: Permanently added '158.69.88.40' (ECDSA) to the list of known hosts.
2015-11-12T23:35:07.155 INFO:teuthology.orchestra.run.target088041.stderr:[proxy:0:1@target088043.ovh.sepia.ceph.com] HYDU_sock_connect (./utils/sock/sock.c:174): unable to connect from "target088043.ovh.sepia.ceph.com" to "158.69.88.41" (Connection timed out)
2015-11-12T23:35:07.155 INFO:teuthology.orchestra.run.target088041.stderr:[proxy:0:1@target088043.ovh.sepia.ceph.com] main (./pm/pmiserv/pmip.c:189): unable to connect to server 158.69.88.41 at port 54948 (check for firewalls!)
2015-11-12T23:35:07.207 INFO:teuthology.orchestra.run.target088041.stderr:[proxy:0:2@target088040.ovh.sepia.ceph.com] HYDU_sock_connect (./utils/sock/sock.c:174): unable to connect from "target088040.ovh.sepia.ceph.com" to "158.69.88.41" (Connection timed out)
2015-11-12T23:35:07.207 INFO:teuthology.orchestra.run.target088041.stderr:[proxy:0:2@target088040.ovh.sepia.ceph.com] main (./pm/pmiserv/pmip.c:189): unable to connect to server 158.69.88.41 at port 54948 (check for firewalls!)

History

#1 Updated by Greg Farnum over 3 years ago

  • Subject changed from qa: openstack mdtest connection failures to qa: openstack MPI connection failures

#2 Updated by John Spray over 3 years ago

Those runs are from a couple of weeks ago, are you sure they aren't from before Loic updated the firewall rules to be more permissive? Can't remember the date that happened. The rules used to open everything up to 10000 iirc, and were amended to go all the way to 2**16

#3 Updated by Greg Farnum over 3 years ago

No, no I'm not sure. I didn't see any more recent runs that succeeded in the pulpito comparison page, but I might have missed one.

#4 Updated by John Spray over 3 years ago

Here's another more recent one:
teuthology-2015-11-26_18:00:01-fs-hammer---basic-openstack/ ['20901', '20933', '20917', '20885']

I wonder if the fix was just on master and needs backporting

#5 Updated by John Spray over 3 years ago

Hmm, so the update to firewall rules was in teuthology, and is this:

commit a6e705bc27090c14bcb90c6129970bbd77137977
Author: Loic Dachary <ldachary@redhat.com>
Date:   Mon Oct 19 23:56:37 2015 +0200

    openstack: open ports 1:65356 for all targets

    Signed-off-by: Loic Dachary <ldachary@redhat.com>

diff --git a/teuthology/openstack/__init__.py b/teuthology/openstack/__init__.py
index 8a575cc..d755616 100644
--- a/teuthology/openstack/__init__.py
+++ b/teuthology/openstack/__init__.py
@@ -485,7 +485,7 @@ ssh access   : ssh {identity}{username}@{ip} # logs in /usr/share/nginx/html
         # for the rest.
         misc.sh(""" 
 openstack security group create teuthology
-openstack security group rule create --dst-port 1:10000 teuthology
+openstack security group rule create --dst-port 1:65535 teuthology
 openstack security group rule create --proto udp --dst-port 53 teuthology # dns
         """)

and it is indeed the master branch in use for these tests, so something else must be going on here...

#6 Updated by Greg Farnum over 3 years ago

  • Status changed from New to Verified
  • Priority changed from Normal to High

#7 Updated by Loic Dachary over 3 years ago

  • Status changed from Verified to Resolved
  • Assignee set to Loic Dachary

The firewall on the OVH lab was configured manually. The code that is quoted is only used when dynamically provisioning a teuthology cluster with teuthology-openstack. I modified the teuthology security group to change the range from 1:10000 to 1:65355.

Also available in: Atom PDF