Project

General

Profile

Bug #58015

provision/fog.py: FOG._wait_for_ready should log exceptions

Added by Dan Mick 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

FOG._wait_for_ready loops while it tries to connect with ssh to the host it's provisioned. If it gets any number of errors, it absorbs the exception silently, and loops for 100 tries of 6-second delays. If there's some persistent error, the user never sees it.

Instead, log.warning() the error, which at least allows the user a chance at diagnosing the problem. (In my case, it was an out-of-date host key in known_hosts, and settings in .ssh/config that paid attention to the error. It's not clear to me that teuthology couldn't/shouldn't fix this anyway, by ignoring host keys on reprovision and passing down the host key from the lock database when doing other operations, but that's another bug.)

my current hack:

--- a/teuthology/provision/fog.py
+++ b/teuthology/provision/fog.py
@@ -278,7 +278,15 @@ class FOG(object):
                     NoValidConnectionsError,
                     MaxWhileTries,
                     EOFError,
-                ):
+                ) as e:
+                    # log this, because otherwise lots of failures just keep retrying without
+                    # any notification (like, say, a mismatched host key in ~/.ssh/known_hosts, or
+                    # something)
+                    #
+                    # on the subject of mismatched host keys: should the target spec's host key
+                    # be getting passed down the stack to orchestra connect -> paramiko?  because
+                    # it doesn't seem to be
+                    log.warning(e)
                     pass
         sentinel_file = config.fog.get('sentinel_file', None)
         if sentinel_file:

History

#1 Updated by Zack Cerza 3 months ago

Seems like a good change, I'd just suggest using `log.exception`

Also available in: Atom PDF