Project

General

Profile

Actions

Bug #58015

closed

provision/fog.py: FOG._wait_for_ready should log exceptions

Added by Dan Mick over 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

FOG._wait_for_ready loops while it tries to connect with ssh to the host it's provisioned. If it gets any number of errors, it absorbs the exception silently, and loops for 100 tries of 6-second delays. If there's some persistent error, the user never sees it.

Instead, log.warning() the error, which at least allows the user a chance at diagnosing the problem. (In my case, it was an out-of-date host key in known_hosts, and settings in .ssh/config that paid attention to the error. It's not clear to me that teuthology couldn't/shouldn't fix this anyway, by ignoring host keys on reprovision and passing down the host key from the lock database when doing other operations, but that's another bug.)

my current hack:

--- a/teuthology/provision/fog.py
+++ b/teuthology/provision/fog.py
@@ -278,7 +278,15 @@ class FOG(object):
                     NoValidConnectionsError,
                     MaxWhileTries,
                     EOFError,
-                ):
+                ) as e:
+                    # log this, because otherwise lots of failures just keep retrying without
+                    # any notification (like, say, a mismatched host key in ~/.ssh/known_hosts, or
+                    # something)
+                    #
+                    # on the subject of mismatched host keys: should the target spec's host key
+                    # be getting passed down the stack to orchestra connect -> paramiko?  because
+                    # it doesn't seem to be
+                    log.warning(e)
                     pass
         sentinel_file = config.fog.get('sentinel_file', None)
         if sentinel_file:

Actions #1

Updated by Zack Cerza over 1 year ago

Seems like a good change, I'd just suggest using `log.exception`

Actions #2

Updated by Zack Cerza 8 months ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF