Project

General

Profile

Fix #7754

get_cluster_log() can fail if 'last_contact' is None

Added by Dan Mick about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Backend (REST API)
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Related issues

Related to Calamari - Fix #7691: api/v2/<id>/log throws EauthAuthenticationError Resolved 03/11/2014

History

#1 Updated by Dan Mick about 10 years ago

  • Category set to Backend (REST API)
  • Source changed from other to Development

For reasons I'm not clear on, inside get_cluster_log(), some servers have a "last_contact" member == None. (those two servers currently have no active salt-minion process, I guess is why.) Anyway, sorting based on laste_contact inside get_cluster_log fails in that case.

Adding a custom sort function that sorts None earlier than any date resolves the issue:

diff --git a/rest-api/calamari_rest/views/v2.py b/rest-api/calamari_rest/views/v2.py
index 02ab2da..40d3a64 100644
--- a/rest-api/calamari_rest/views/v2.py
+++ b/rest-api/calamari_rest/views/v2.py
@@ -590,6 +590,16 @@ GETs take an optional ``lines`` parameter for the number of lines to retrieve.
     """ 
     serializer_class = LogTailSerializer

+    @staticmethod
+    def sortbylastcontact(a, b):
+        if not a['last_contact']:
+            return 1
+        elif not b['last_contact']:
+            return -1
+        else:
+            return cmp(dateutil_parse(a['last_contact']),
+                      dateutil_parse(b['last_contact']))
+
     def get_cluster_log(self, request, fsid):
         """ 
         Retrieve the cluster log from one of a cluster's mons (expect it to be in /var/log/ceph/ceph.log)
@@ -603,8 +613,7 @@ GETs take an optional ``lines`` parameter for the number of lines to retrieve.
         # Resolve FSID to list of mon FQDNs
         servers = self.client.server_list_cluster(fsid)
         # Sort to get most recently contact server first
-        servers = sorted(servers,
-                         lambda a, b: cmp(dateutil_parse(b['last_contact']), dateutil_parse(a['last_contact'])))
+        servers = sorted(servers, cmp=self.sortbylastcontact)
         mon_fqdns = []
         for server in servers:
             for service in server['services']:

#2 Updated by John Spray about 10 years ago

as you guess, last_contact=None indicates "unmanaged" servers, i.e. those that we know about but aren't in direct communication with (typically OSD servers we have inferred from the OSD map).

Fix looks good to me.

#3 Updated by John Spray about 10 years ago

  • Tracker changed from Bug to Fix

#4 Updated by Dan Mick about 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Dan Mick
  • Target version set to v1.2-dev6

#5 Updated by Dan Mick about 10 years ago

This is more concise and more python3-friendly. Is it better?

--- a/rest-api/calamari_rest/views/v2.py
+++ b/rest-api/calamari_rest/views/v2.py
@@ -608,9 +608,11 @@ GETs take an optional ``lines`` parameter for the number of lines to retrieve.

         # Resolve FSID to list of mon FQDNs
         servers = self.client.server_list_cluster(fsid)
-        # Sort to get most recently contact server first
+        # Sort to get most recently contacted server first.  Allow for
+        # None in last_contact by exploiting short-circuit 'and'
         servers = sorted(servers,
-                         lambda a, b: cmp(dateutil_parse(b['last_contact']), dateutil_parse(a['last_contact'])))
+                         key=lambda t: t['last_contact'] and
+                         dateutil_parse(t['last_contact']))
         mon_fqdns = []
         for server in servers:
             for service in server['services']:

#6 Updated by Dan Mick about 10 years ago

Gregory prompts me to notice that 1) both my solutions lost the "newest-first" ordering that was the point here, but also 2) servers with None as 'last_contact' probably shouldn't even be tried. New proposal:

--- a/rest-api/calamari_rest/views/v2.py
+++ b/rest-api/calamari_rest/views/v2.py
@@ -608,9 +608,10 @@ GETs take an optional ``lines`` parameter for the number of lines to retrieve.

         # Resolve FSID to list of mon FQDNs
         servers = self.client.server_list_cluster(fsid)
-        # Sort to get most recently contact server first
-        servers = sorted(servers,
-                         lambda a, b: cmp(dateutil_parse(b['last_contact']), dateutil_parse(a['last_contact'])))
+        # Sort to get most recently contacted server first; drop any
+        # for whom last_contact is None
+        servers = [s for s in servers if s['last_contact']]
+        servers = sorted(servers, key=lambda t: t['last_contact'], reverse=True)
         mon_fqdns = []
         for server in servers:
             for service in server['services']:

#7 Updated by Dan Mick about 10 years ago

  • Status changed from In Progress to Resolved

1a1690059b48d59fb33bc0ff2d1a28f9b919feb6 (I missed the '#' in the Fixes: header in the commit msg)

Also available in: Atom PDF