


HOWTO forensic analysis of integration and upgrade tests » History » Revision 8

Revision 7 (Loïc Dachary, 05/15/2015 12:14 PM) → Revision 8/13 (Loïc Dachary, 05/15/2015 12:15 PM)

h3. Analysis 

 When a teuthology job has failed, it needs to be analyzed and linked to an issue (in issue. A number of teuthology jobs inject random failures in the cluster to observe how it behaves. It it therefore not uncommon to see jobs that fail or succeed, depending.  

 * For a given teuthology job, there is a pulpito page (for instance 
 * Research for the error string to find existing issues. For instance has *ceph-objectstore-tool: import failure with status 139* which has "a few issues associated with it": 
 * If an issue is found and it looks like knowing it happened one more time is useful, add a comment with a link to the failed job and the relevant quote from the logs. 
 * Click *All details...* to show the YAML file and *Control-f description* to see the job description which is the list of YAML files that were used to create the job. They can be found at (where *firefly* can be replaced by the stable release name). 
 * Download the teuthology logs from the link provided by the pulpito page (for instance 
 * Explore the logs and core dumps collected by teuthology. If the log is at the rest can be found by removing the teuthology.log part of the path, i.e. 
 * In the teuthology log, look for the first *Traceback* and look around it: this is when something went wrong first. 
 2015-05-15T03:56:10.905 ERROR:teuthology.contextutil:Saw exception from nested tasks 
 Traceback (most recent call last): 
   File "/home/teuthworker/src/teuthology_master/teuthology/", line 30, in nested 
     yield vars 
   File "/home/teuthworker/src/teuthology_master/teuthology/task/", line 1298, in task 
   File "/home/teuthworker/src/teuthology_master/teuthology/", line 125, in run_tasks 
     suppress = manager.__exit__(*exc_info) 
   File "/usr/lib/python2.7/", line 24, in __exit__ 
   File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/", line 183, in task 
   File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/", line 356, in do_join 
   File "/usr/lib/python2.7/dist-packages/gevent/", line 308, in get 
     raise self._exception 
 Exception: ceph-objectstore-tool: import failure with status 139 
 * Examine the relevant OSD, MDS or MON logs. The logs are used on a daily basis by developers to figure out problems. They are not an easy read but they can be relied on to display the necessary information to figure out the sequence of operations that lead to a given problem. 
 * Obtain a backtrace from the coredumps (see for a way to do that), if they are not in the OSD, MDS or MON logs (they usually are) 

 h3. Tools 

 **    command line example: 
 user@machine:~$ python ~/<scrape_dir>/ /a/<run_name> 
 *** this will generally run in all labs (sepia, octo, typica) as */a* exits in all of them