Project

General

Profile

Actions

Bug #14459

closed

cephlab_rc_local needs to be smarter about waiting for the network

Added by Anonymous over 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

From a conversation with dgalloway:

<wusui> i am having trouble ssh'ing from magna002 to magna049 magna079 and magna086 (all ask for ubuntu's password)
<dgalloway> ok. that leads me to believe something's wrong in the ansible. the logs indicate the user is created and password is deleted but that's obviously not the case
<wusui> okay. i have enough oher machines locked. it's not crucial that you figure out these machines right now.
<dgalloway> i'll fix 079 and 086. can you put a ticket in with your observations and say 049 can be used to investigate?
<wusui> okay
<dgalloway> i just want to make sure it doesn't get forgotten
<dgalloway> you can just paste this IRC convo if you want
<wusui> okay
<dgalloway> thanks


Related issues 1 (0 open1 closed)

Has duplicate ceph-cm-ansible - Bug #14477: Refine post-install cobbler ansible triggerDuplicateDavid Galloway01/22/2016

Actions
Actions #1

Updated by Zack Cerza over 8 years ago

Just a bit of information: magna049 was reimaged a second time today, and ansible wasn't properly run after the second reimage.

[root@magna049 ~]# TZ=UTC ls -l /.cephlab_rc_local
-rw-r--r--. 1 root root 0 Jan 21 18:30 /.cephlab_rc_local
[root@magna049 ~]# ls -l /ceph-qa-ready
ls: cannot access /ceph-qa-ready: No such file or directory
[root@magna001 ansible]# TZ=UTC ls -l /var/log/ansible/magna049.log
-rw-r--r-- 1 root root 320903 Jan 21 17:04 /var/log/ansible/magna049.log
[root@magna049 ~]# cat /tmp/rc.local.log
+ '[' -e /.cephlab_rc_local ']'
+ sleep 30
+ wget -t1 -O /dev/null http://10.8.128.1:80/cblr/svc/op/trig/mode/post/system/magna049
--2016-01-21 13:29:12--  http://10.8.128.1/cblr/svc/op/trig/mode/post/system/magna049
Connecting to 10.8.128.1:80... connected.
HTTP request sent, awaiting response... 500 Internal Server Error
2016-01-21 13:30:12 ERROR 500: Internal Server Error.

+ true
+ touch /.cephlab_rc_local
Actions #2

Updated by Zack Cerza over 8 years ago

If you look at the http error log on magna001 around the timestamp: Thu Jan 21 18:31:14 2016, there are lots of timeouts...

Actions #3

Updated by Zack Cerza about 8 years ago

  • Project changed from 19 to ceph-cm-ansible
  • Subject changed from Some magna machines are not reachable via passwordless ssh to cephlab_rc_local needs to be smarter about waiting for the network
  • Assignee set to Zack Cerza
Actions #4

Updated by David Galloway almost 8 years ago

  • Status changed from New to Resolved
  • Assignee changed from Zack Cerza to David Galloway
Actions #5

Updated by David Galloway almost 8 years ago

  • Has duplicate Bug #14477: Refine post-install cobbler ansible trigger added
Actions

Also available in: Atom PDF