Bug #13876
closedqa: openstack MPI connection failures
0%
Description
2015-11-12T23:32:58.272 INFO:teuthology.orchestra.run.target088041:Running: 'mpiexec -f /home/ubuntu/cephtest/mpi-hosts /home/ubuntu/cephtest/mdtest-1.9.3/mdtest -d /home/ubuntu/cephtest/gmnt -I 20 -z 5 -b 2 -R' 2015-11-12T23:32:58.604 INFO:teuthology.orchestra.run.target088041.stderr:Warning: Permanently added '158.69.88.43' (ECDSA) to the list of known hosts. 2015-11-12T23:32:58.606 INFO:teuthology.orchestra.run.target088041.stderr:Warning: Permanently added '158.69.88.40' (ECDSA) to the list of known hosts. 2015-11-12T23:35:07.155 INFO:teuthology.orchestra.run.target088041.stderr:[proxy:0:1@target088043.ovh.sepia.ceph.com] HYDU_sock_connect (./utils/sock/sock.c:174): unable to connect from "target088043.ovh.sepia.ceph.com" to "158.69.88.41" (Connection timed out) 2015-11-12T23:35:07.155 INFO:teuthology.orchestra.run.target088041.stderr:[proxy:0:1@target088043.ovh.sepia.ceph.com] main (./pm/pmiserv/pmip.c:189): unable to connect to server 158.69.88.41 at port 54948 (check for firewalls!) 2015-11-12T23:35:07.207 INFO:teuthology.orchestra.run.target088041.stderr:[proxy:0:2@target088040.ovh.sepia.ceph.com] HYDU_sock_connect (./utils/sock/sock.c:174): unable to connect from "target088040.ovh.sepia.ceph.com" to "158.69.88.41" (Connection timed out) 2015-11-12T23:35:07.207 INFO:teuthology.orchestra.run.target088041.stderr:[proxy:0:2@target088040.ovh.sepia.ceph.com] main (./pm/pmiserv/pmip.c:189): unable to connect to server 158.69.88.41 at port 54948 (check for firewalls!)
Updated by Greg Farnum over 8 years ago
- Subject changed from qa: openstack mdtest connection failures to qa: openstack MPI connection failures
Updated by John Spray over 8 years ago
Those runs are from a couple of weeks ago, are you sure they aren't from before Loic updated the firewall rules to be more permissive? Can't remember the date that happened. The rules used to open everything up to 10000 iirc, and were amended to go all the way to 2**16
Updated by Greg Farnum over 8 years ago
No, no I'm not sure. I didn't see any more recent runs that succeeded in the pulpito comparison page, but I might have missed one.
Updated by John Spray over 8 years ago
Here's another more recent one:
teuthology-2015-11-26_18:00:01-fs-hammer---basic-openstack/ ['20901', '20933', '20917', '20885']
I wonder if the fix was just on master and needs backporting
Updated by John Spray over 8 years ago
Hmm, so the update to firewall rules was in teuthology, and is this:
commit a6e705bc27090c14bcb90c6129970bbd77137977 Author: Loic Dachary <ldachary@redhat.com> Date: Mon Oct 19 23:56:37 2015 +0200 openstack: open ports 1:65356 for all targets Signed-off-by: Loic Dachary <ldachary@redhat.com> diff --git a/teuthology/openstack/__init__.py b/teuthology/openstack/__init__.py index 8a575cc..d755616 100644 --- a/teuthology/openstack/__init__.py +++ b/teuthology/openstack/__init__.py @@ -485,7 +485,7 @@ ssh access : ssh {identity}{username}@{ip} # logs in /usr/share/nginx/html # for the rest. misc.sh(""" openstack security group create teuthology -openstack security group rule create --dst-port 1:10000 teuthology +openstack security group rule create --dst-port 1:65535 teuthology openstack security group rule create --proto udp --dst-port 53 teuthology # dns """)
and it is indeed the master branch in use for these tests, so something else must be going on here...
Updated by Greg Farnum over 8 years ago
- Status changed from New to 12
- Priority changed from Normal to High
Updated by Loïc Dachary about 8 years ago
- Status changed from 12 to Resolved
- Assignee set to Loïc Dachary
The firewall on the OVH lab was configured manually. The code that is quoted is only used when dynamically provisioning a teuthology cluster with teuthology-openstack. I modified the teuthology security group to change the range from 1:10000 to 1:65355.