Project

General

Profile

HOWTO forensic analysis of integration and upgrade tests » History » Revision 3

Revision 2 (Loïc Dachary, 05/15/2015 10:17 AM) → Revision 3/13 (Loïc Dachary, 05/15/2015 12:05 PM)

h3. Steps 

 * For a given job, there is a pulpito page (for instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/) 
 * Research tracker.ceph.com for the error string to find existing issues. For instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ has "a few issues associated with it":http://tracker.ceph.com/projects/ceph/search?utf8=%E2%9C%93&issues=1&q=ceph-objectstore-tool%3A+import+failure+with+status+139+ 
 *ceph-objectstore-tool: import failure with status 139* which  
 * Click *All details...* to show the YAML file and *Control-f description* to see the job description which is the list of YAML files that were used to create the job. They can be found at https://github.com/ceph/ceph-qa-suite/blob/firefly/suites (where *firefly* can be replaced by the stable release name). 
 * Download the teuthology logs from the link provided by the pulpito page (for instance http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log) 
 * Explore the logs and core dumps collected by teuthology. If the log is at http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log the rest can be found by removing the teuthology.log part of the path, i.e. http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ 
 * In the teuthology log, look for the first *Traceback* and look around it: this is when something went wrong first. Backtrace 
 <pre> 
 2015-05-15T03:56:10.905 ERROR:teuthology.contextutil:Saw exception from nested tasks 
 Traceback (most recent call last): 
   File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested 
     yield vars 
   File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 1298, in task 
     yield 
   File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks 
     suppress = manager.__exit__(*exc_info) 
   File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 
     self.gen.next() 
   File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task 
     thrash_proc.do_join() 
   File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 356, in do_join 
     self.thread.get() 
   File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get 
     raise self._exception 
 Exception: ceph-objectstore-tool: import failure with status 139 
 </pre> 
 * Examine the relevant OSD, MDS or MON logs 
 * Obtain a backtrace from the coredumps (see http://dachary.org/?p=3568 for a way to do that), coredumps, if they are not in the OSD, MDS or MON logs (they usually are) 
 * Research tracker.ceph.com for the error string to find existing issues 

 h3. Tools 

 * https://github.com/jcsp/scrape/blob/master/scrape.py 
 **    command line example: 
 <pre> 
 user@machine:~$ python ~/<scrape_dir>/scrape.py /a/<run_name> 
 </pre> 
 *** this will generally run in all labs (sepia, octo, typica) as */a* exits in all of them