Project

General

Profile

Actions

Bug #7241

closed

missing mon/osd logs in test runs

Added by Alfredo Deza over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

At least in the ceph-deploy tests, only the syslog file seems to be available and nothing else.

This is causing issues when debugging failures.

Example test run with all tests with syslog only:

http://qa-proxy.ceph.com/teuthology/teuthology-2014-01-27_01:10:01-ceph-deploy-master-testing-basic-vps/

Example job:

http://qa-proxy.ceph.com/teuthology/teuthology-2014-01-27_01:10:01-ceph-deploy-master-testing-basic-vps/55366/remote/ubuntu@vpm076.front.sepia.ceph.com/

Actions #1

Updated by Zack Cerza over 10 years ago

Looking at http://qa-proxy.ceph.com/teuthology/teuthology-2014-01-27_01:10:01-ceph-deploy-master-testing-basic-vps/55366/teuthology.log

The exception is being raised here:
https://github.com/ceph/teuthology/blob/master/teuthology/task/ceph-deploy.py#L268

What should be happening is that the finally: block of build_ceph_cluster() should be executed. That contains code to grab relevant log files from the remotes and starts here:
https://github.com/ceph/teuthology/blob/master/teuthology/task/ceph-deploy.py#L337

However immediately after the exception is raised, instead of seeing "Stopping ceph..." we see "Removing ceph-deploy", which is part of the finally: block of download_ceph_deploy().

If we can figure out why build_ceph_cluster()'s finally: block is being skipped, we can fix this.

Actions #2

Updated by Zack Cerza over 10 years ago

I spent some time writing a simplified reproducer to help educate myself on how teuthology's contextlib extension, contextutil, plays along with the native contextmanager.

It makes sense now that I've looked into it. Anything outside of the try/except/finally blocks that causes an exception to be raised will also cause the corresponding finally block to be skipped. Unfortunately almost every place we use a contextmanager, 99% of the relevant code lies outside of the try/except/finally blocks - effectively rendering the finally blocks useless.

I'm thinking we could try, starting with the ceph-deploy task, moving everything to inside a try block.

Actions

Also available in: Atom PDF