Actions
HOWTO forensic analysis of integration and upgrade tests » History » Revision 3
« Previous |
Revision 3/13
(diff)
| Next »
Loïc Dachary, 05/15/2015 12:05 PM
Steps¶
- For a given job, there is a pulpito page (for instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/)
- Research tracker.ceph.com for the error string to find existing issues. For instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ has a few issues associated with it+
ceph-objectstore-tool: import failure with status 139 which - Click All details... to show the YAML file and Control-f description to see the job description which is the list of YAML files that were used to create the job. They can be found at https://github.com/ceph/ceph-qa-suite/blob/firefly/suites (where firefly can be replaced by the stable release name).
- Download the teuthology logs from the link provided by the pulpito page (for instance http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log)
- Explore the logs and core dumps collected by teuthology. If the log is at http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log the rest can be found by removing the teuthology.log part of the path, i.e. http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/
- In the teuthology log, look for the first Traceback and look around it: this is when something went wrong first.
2015-05-15T03:56:10.905 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested yield vars File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 1298, in task yield File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task thrash_proc.do_join() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 356, in do_join self.thread.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception Exception: ceph-objectstore-tool: import failure with status 139
- Examine the relevant OSD, MDS or MON logs
- Obtain a backtrace from the coredumps (see http://dachary.org/?p=3568 for a way to do that), if they are not in the OSD, MDS or MON logs (they usually are)
Tools¶
- https://github.com/jcsp/scrape/blob/master/scrape.py
- command line example:
user@machine:~$ python ~/<scrape_dir>/scrape.py /a/<run_name>
- this will generally run in all labs (sepia, octo, typica) as /a exits in all of them
- command line example:
Updated by Loïc Dachary almost 9 years ago · 3 revisions