HOWTO forensic analysis of integration and upgrade tests » History » Version 6
Loïc Dachary, 05/15/2015 12:10 PM
1 | 2 | Loïc Dachary | h3. Steps |
---|---|---|---|
2 | |||
3 | 3 | Loïc Dachary | * For a given job, there is a pulpito page (for instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/) |
4 | 4 | Loïc Dachary | * Research tracker.ceph.com for the error string to find existing issues. For instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ has *ceph-objectstore-tool: import failure with status 139* which has "a few issues associated with it":http://tracker.ceph.com/projects/ceph/search?utf8=%E2%9C%93&issues=1&q=ceph-objectstore-tool%3A+import+failure+with+status+139 |
5 | 5 | Loïc Dachary | * If an issue is found and it looks like knowing it happened one more time is useful, add a comment with a link to the failed job and the relevant quote from the logs. |
6 | 3 | Loïc Dachary | * Click *All details...* to show the YAML file and *Control-f description* to see the job description which is the list of YAML files that were used to create the job. They can be found at https://github.com/ceph/ceph-qa-suite/blob/firefly/suites (where *firefly* can be replaced by the stable release name). |
7 | * Download the teuthology logs from the link provided by the pulpito page (for instance http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log) |
||
8 | * Explore the logs and core dumps collected by teuthology. If the log is at http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log the rest can be found by removing the teuthology.log part of the path, i.e. http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ |
||
9 | * In the teuthology log, look for the first *Traceback* and look around it: this is when something went wrong first. |
||
10 | <pre> |
||
11 | 2015-05-15T03:56:10.905 ERROR:teuthology.contextutil:Saw exception from nested tasks |
||
12 | Traceback (most recent call last): |
||
13 | File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested |
||
14 | yield vars |
||
15 | File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 1298, in task |
||
16 | yield |
||
17 | File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks |
||
18 | suppress = manager.__exit__(*exc_info) |
||
19 | File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ |
||
20 | self.gen.next() |
||
21 | File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task |
||
22 | thrash_proc.do_join() |
||
23 | File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 356, in do_join |
||
24 | self.thread.get() |
||
25 | File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get |
||
26 | raise self._exception |
||
27 | Exception: ceph-objectstore-tool: import failure with status 139 |
||
28 | </pre> |
||
29 | 6 | Loïc Dachary | * Examine the relevant OSD, MDS or MON logs. The logs are used on a daily basis by developers to figure out problems. They are not an easy read but they can be relied on to display the necessary information to figure out the sequence of operations that lead to a given problem. |
30 | 3 | Loïc Dachary | * Obtain a backtrace from the coredumps (see http://dachary.org/?p=3568 for a way to do that), if they are not in the OSD, MDS or MON logs (they usually are) |
31 | 2 | Loïc Dachary | |
32 | h3. Tools |
||
33 | |||
34 | * https://github.com/jcsp/scrape/blob/master/scrape.py |
||
35 | ** command line example: |
||
36 | <pre> |
||
37 | user@machine:~$ python ~/<scrape_dir>/scrape.py /a/<run_name> |
||
38 | </pre> |
||
39 | 1 | Loïc Dachary | *** this will generally run in all labs (sepia, octo, typica) as */a* exits in all of them |