Project

General

Profile

HOWTO forensic analysis of integration and upgrade tests » History » Version 5

Loïc Dachary, 05/15/2015 12:07 PM

1 2 Loïc Dachary
h3. Steps
2 2 Loïc Dachary
3 3 Loïc Dachary
* For a given job, there is a pulpito page (for instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/)
4 4 Loïc Dachary
* Research tracker.ceph.com for the error string to find existing issues. For instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ has *ceph-objectstore-tool: import failure with status 139* which has "a few issues associated with it":http://tracker.ceph.com/projects/ceph/search?utf8=%E2%9C%93&issues=1&q=ceph-objectstore-tool%3A+import+failure+with+status+139
5 5 Loïc Dachary
* If an issue is found and it looks like knowing it happened one more time is useful, add a comment with a link to the failed job and the relevant quote from the logs.
6 3 Loïc Dachary
* Click *All details...* to show the YAML file and *Control-f description* to see the job description which is the list of YAML files that were used to create the job. They can be found at https://github.com/ceph/ceph-qa-suite/blob/firefly/suites (where *firefly* can be replaced by the stable release name).
7 3 Loïc Dachary
* Download the teuthology logs from the link provided by the pulpito page (for instance http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log)
8 3 Loïc Dachary
* Explore the logs and core dumps collected by teuthology. If the log is at http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log the rest can be found by removing the teuthology.log part of the path, i.e. http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/
9 3 Loïc Dachary
* In the teuthology log, look for the first *Traceback* and look around it: this is when something went wrong first.
10 3 Loïc Dachary
<pre>
11 3 Loïc Dachary
2015-05-15T03:56:10.905 ERROR:teuthology.contextutil:Saw exception from nested tasks
12 3 Loïc Dachary
Traceback (most recent call last):
13 3 Loïc Dachary
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested
14 3 Loïc Dachary
    yield vars
15 3 Loïc Dachary
  File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 1298, in task
16 3 Loïc Dachary
    yield
17 3 Loïc Dachary
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks
18 3 Loïc Dachary
    suppress = manager.__exit__(*exc_info)
19 3 Loïc Dachary
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
20 3 Loïc Dachary
    self.gen.next()
21 3 Loïc Dachary
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task
22 3 Loïc Dachary
    thrash_proc.do_join()
23 3 Loïc Dachary
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 356, in do_join
24 3 Loïc Dachary
    self.thread.get()
25 3 Loïc Dachary
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
26 3 Loïc Dachary
    raise self._exception
27 3 Loïc Dachary
Exception: ceph-objectstore-tool: import failure with status 139
28 3 Loïc Dachary
</pre>
29 1 Loïc Dachary
* Examine the relevant OSD, MDS or MON logs
30 3 Loïc Dachary
* Obtain a backtrace from the coredumps (see http://dachary.org/?p=3568 for a way to do that), if they are not in the OSD, MDS or MON logs (they usually are)
31 2 Loïc Dachary
32 2 Loïc Dachary
h3. Tools
33 2 Loïc Dachary
34 2 Loïc Dachary
* https://github.com/jcsp/scrape/blob/master/scrape.py
35 2 Loïc Dachary
**  command line example:
36 2 Loïc Dachary
<pre>
37 2 Loïc Dachary
user@machine:~$ python ~/<scrape_dir>/scrape.py /a/<run_name>
38 2 Loïc Dachary
</pre>
39 1 Loïc Dachary
*** this will generally run in all labs (sepia, octo, typica) as */a* exits in all of them