HOWTO forensic analysis of integration and upgrade tests » History » Version 3
Loïc Dachary, 05/15/2015 12:05 PM
1 | 2 | Loïc Dachary | h3. Steps |
---|---|---|---|
2 | |||
3 | 3 | Loïc Dachary | * For a given job, there is a pulpito page (for instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/) |
4 | * Research tracker.ceph.com for the error string to find existing issues. For instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ has "a few issues associated with it":http://tracker.ceph.com/projects/ceph/search?utf8=%E2%9C%93&issues=1&q=ceph-objectstore-tool%3A+import+failure+with+status+139+ |
||
5 | *ceph-objectstore-tool: import failure with status 139* which |
||
6 | * Click *All details...* to show the YAML file and *Control-f description* to see the job description which is the list of YAML files that were used to create the job. They can be found at https://github.com/ceph/ceph-qa-suite/blob/firefly/suites (where *firefly* can be replaced by the stable release name). |
||
7 | * Download the teuthology logs from the link provided by the pulpito page (for instance http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log) |
||
8 | * Explore the logs and core dumps collected by teuthology. If the log is at http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log the rest can be found by removing the teuthology.log part of the path, i.e. http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ |
||
9 | * In the teuthology log, look for the first *Traceback* and look around it: this is when something went wrong first. |
||
10 | <pre> |
||
11 | 2015-05-15T03:56:10.905 ERROR:teuthology.contextutil:Saw exception from nested tasks |
||
12 | Traceback (most recent call last): |
||
13 | File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested |
||
14 | yield vars |
||
15 | File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 1298, in task |
||
16 | yield |
||
17 | File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks |
||
18 | suppress = manager.__exit__(*exc_info) |
||
19 | File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ |
||
20 | self.gen.next() |
||
21 | File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task |
||
22 | thrash_proc.do_join() |
||
23 | File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 356, in do_join |
||
24 | self.thread.get() |
||
25 | File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get |
||
26 | raise self._exception |
||
27 | Exception: ceph-objectstore-tool: import failure with status 139 |
||
28 | </pre> |
||
29 | 1 | Loïc Dachary | * Examine the relevant OSD, MDS or MON logs |
30 | 3 | Loïc Dachary | * Obtain a backtrace from the coredumps (see http://dachary.org/?p=3568 for a way to do that), if they are not in the OSD, MDS or MON logs (they usually are) |
31 | 2 | Loïc Dachary | |
32 | h3. Tools |
||
33 | |||
34 | * https://github.com/jcsp/scrape/blob/master/scrape.py |
||
35 | ** command line example: |
||
36 | <pre> |
||
37 | user@machine:~$ python ~/<scrape_dir>/scrape.py /a/<run_name> |
||
38 | </pre> |
||
39 | 1 | Loïc Dachary | *** this will generally run in all labs (sepia, octo, typica) as */a* exits in all of them |