Project

General

Profile

Format of teuthology analysis entries

For instance http://tracker.ceph.com/issues/14692#note-5 is structured like so:

  • In chronological order
  • The command line (that can be copy/pasted) used to run the suite
  • A bullet point with the URL to the suite run in pulpito prefixed by
    • running if the run is not complete
    • fail if the run has at least one error
    • green if the run has no error
  • If the run has at least one error, the output of the fail formatter snippet is added and edited when the errors are analyzed (call with python fail.py loic-2015-04-21_10:20:06-rados-firefly-backports---basic-multi fail or python fail.py loic-2015-04-21_10:20:06-rados-firefly-backports---basic-multi fail 167.114.249.14 if using teuthology-openstack and the IP is where the pulpito results are available)
    import re
    import sys
    import requests
    
    if len(sys.argv) > 3:
      host = sys.argv[3]
      paddle_host = host + ":8080" 
      pulpito_host = host + ":8081" 
    else:
      paddle_host = 'paddles.front.sepia.ceph.com'
      pulpito_host = 'pulpito.ceph.com'
    
    paddle = ("http://" + paddle_host + "/runs/" +
              sys.argv[1] +
              "/jobs/?status=" + sys.argv[2])
    pulpito = ("http://" + pulpito_host + "/" +
               sys.argv[1])
    failure2jobs = {}
    
    def normalize(failure):
        if 'wget -O- ' in failure:
            return re.findall('"(.*)"', failure)[0]
        if 'Command failed' in failure:
            return re.findall('Command failed.*?:\s*(.*)', failure, )[0]
        else:
            return failure
    
    for job in requests.get(paddle).json():
        failure2jobs.setdefault(normalize(job['failure_reason']), []).append(job)
    
    for (failure, jobs) in failure2jobs.iteritems():
        print "** *" + failure + "*" 
        for job in jobs:
            print '*** "' + job['description'] + '":' + pulpito + '/' + job['job_id']
    
    

The idea is to run the script after the entire suite completes (though this is not strictly necessary - jobs can be re-started even while the original suite is still running, but then one can easily get confused). The output of the script is copy/pasted into the release tracker issue. Then, each failure is analyzed and marked as belonging to one of the following categories:

  • When an error is analyzed, the link to the error is prefixed with
    • environmental noise if it must be run again because it failed for reasons unrelated to the test itself (DNS error etc.)
    • known bug and a URL to the bug (not just the number of the bug)
    • new bug and a URL to the newly created bug if it was discovered during the analysis of this error: it is likely to be a regression
    • can be ignored and a URL to the bug (not just the number of the bug) and the reason why it can be ignored (possibly a link to a mail thread)