HOWTO forensic analysis of integration and upgrade tests » History » Revision 3
Revision 2 (Loïc Dachary, 05/15/2015 10:17 AM) → Revision 3/13 (Loïc Dachary, 05/15/2015 12:05 PM)
h3. Steps * For a given job, there is a pulpito page (for instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/) * Research tracker.ceph.com for the error string to find existing issues. For instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ has "a few issues associated with it":http://tracker.ceph.com/projects/ceph/search?utf8=%E2%9C%93&issues=1&q=ceph-objectstore-tool%3A+import+failure+with+status+139+ *ceph-objectstore-tool: import failure with status 139* which * Click *All details...* to show the YAML file and *Control-f description* to see the job description which is the list of YAML files that were used to create the job. They can be found at https://github.com/ceph/ceph-qa-suite/blob/firefly/suites (where *firefly* can be replaced by the stable release name). * Download the teuthology logs from the link provided by the pulpito page (for instance http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log) * Explore the logs and core dumps collected by teuthology. If the log is at http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log the rest can be found by removing the teuthology.log part of the path, i.e. http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ * In the teuthology log, look for the first *Traceback* and look around it: this is when something went wrong first. Backtrace <pre> 2015-05-15T03:56:10.905 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested yield vars File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 1298, in task yield File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task thrash_proc.do_join() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 356, in do_join self.thread.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception Exception: ceph-objectstore-tool: import failure with status 139 </pre> * Examine the relevant OSD, MDS or MON logs * Obtain a backtrace from the coredumps (see http://dachary.org/?p=3568 for a way to do that), coredumps, if they are not in the OSD, MDS or MON logs (they usually are) * Research tracker.ceph.com for the error string to find existing issues h3. Tools * https://github.com/jcsp/scrape/blob/master/scrape.py ** command line example: <pre> user@machine:~$ python ~/<scrape_dir>/scrape.py /a/<run_name> </pre> *** this will generally run in all labs (sepia, octo, typica) as */a* exits in all of them