Project

General

Profile

HOWTO forensic analysis of integration and upgrade tests » History » Version 3

Loïc Dachary, 05/15/2015 12:05 PM

1 2 Loïc Dachary
h3. Steps
2
3 3 Loïc Dachary
* For a given job, there is a pulpito page (for instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/)
4
* Research tracker.ceph.com for the error string to find existing issues. For instance http://pulpito.ceph.com/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/ has "a few issues associated with it":http://tracker.ceph.com/projects/ceph/search?utf8=%E2%9C%93&issues=1&q=ceph-objectstore-tool%3A+import+failure+with+status+139+
5
*ceph-objectstore-tool: import failure with status 139* which 
6
* Click *All details...* to show the YAML file and *Control-f description* to see the job description which is the list of YAML files that were used to create the job. They can be found at https://github.com/ceph/ceph-qa-suite/blob/firefly/suites (where *firefly* can be replaced by the stable release name).
7
* Download the teuthology logs from the link provided by the pulpito page (for instance http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log)
8
* Explore the logs and core dumps collected by teuthology. If the log is at http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/teuthology.log the rest can be found by removing the teuthology.log part of the path, i.e. http://qa-proxy.ceph.com/teuthology/loic-2015-05-13_00:58:29-rados-firefly-backports---basic-multi/888125/
9
* In the teuthology log, look for the first *Traceback* and look around it: this is when something went wrong first.
10
<pre>
11
2015-05-15T03:56:10.905 ERROR:teuthology.contextutil:Saw exception from nested tasks
12
Traceback (most recent call last):
13
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested
14
    yield vars
15
  File "/home/teuthworker/src/teuthology_master/teuthology/task/install.py", line 1298, in task
16
    yield
17
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks
18
    suppress = manager.__exit__(*exc_info)
19
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
20
    self.gen.next()
21
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 183, in task
22
    thrash_proc.do_join()
23
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 356, in do_join
24
    self.thread.get()
25
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
26
    raise self._exception
27
Exception: ceph-objectstore-tool: import failure with status 139
28
</pre>
29 1 Loïc Dachary
* Examine the relevant OSD, MDS or MON logs
30 3 Loïc Dachary
* Obtain a backtrace from the coredumps (see http://dachary.org/?p=3568 for a way to do that), if they are not in the OSD, MDS or MON logs (they usually are)
31 2 Loïc Dachary
32
h3. Tools
33
34
* https://github.com/jcsp/scrape/blob/master/scrape.py
35
**  command line example:
36
<pre>
37
user@machine:~$ python ~/<scrape_dir>/scrape.py /a/<run_name>
38
</pre>
39 1 Loïc Dachary
*** this will generally run in all labs (sepia, octo, typica) as */a* exits in all of them