Bug #14172: https://jenkins.ceph.com/job/ceph-pull-requests/ aka make check fixes - devops - Ceph

Actions

Copy link

Bug #14172

closed

https://jenkins.ceph.com/job/ceph-pull-requests/ aka make check fixes

Added by Loïc Dachary over 8 years ago. Updated over 8 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Andrew Schoen

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Workaround¶

run-make-check.sh ; sudo reboot to cleanup lingering processes. This should allow the bot to run for weeks before it requires a re-image for some reason.

Isssues¶

running on unsupported operating systems (CentOS 6, precise and maybe others)
leftovers from a previous test (which should be removed when a new slave is provisionned for each test)
keep the last 300 jobs for forensic analysis (about one week worth)
re-enable the jenkins job
disable reporting to github pull requests so that the stability of the run can be verified without sending numerous false negative while doing so

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Status changed from New to 12
Priority changed from Normal to Urgent

Setting to urgent because the absence of an automated make check has a noticeable daily impact on the work of Ceph developers

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Subject changed from https://jenkins.ceph.com/job/ceph-pull-requests/ fixes to https://jenkins.ceph.com/job/ceph-pull-requests/ aka make check fixes

<alfredodeza> loicd: CI is not meant to be openstack only, and it does involve a bit more work other than to just call an external API to terminate the instance
<alfredodeza> the current state of CI is that there is no way to be able to spin up/down nodes
<loicd> alfredodeza: it's using OpenStack but is not able to spin up/down nodes ? How did that happen ? 
<loicd> isn't it what https://github.com/alfredodeza/mita/ is about ?
<alfredodeza> loicd: that is just one component of ci
<alfredodeza> which is meant to get nodes up and down
<alfredodeza> *but we are not there yet*
<alfredodeza> it needs to be worked on
<loicd> alfredodeza: in the meantime I believe it would be enough to just reboot after the test. There is a *very* small chance that the test will taint the file system (i.e. fill / or something hard to recover from). No more chance that any python test really. There however is a *high* chance that a process survives the test because of a bug and the reboot can take care of that. The better solution would be to re-image every time but that will allow the bot to resume service efficiently.
<loicd> alfredodeza: will jenkins automatically reconnect to the slave after a reboot ? or does it need some extra infrastructure work ?
<alfredodeza> automatic
<loicd> cool then :-)

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

https://github.com/ceph/ceph-build/pull/284

Actions

Copy link

Updated by Andrew Schoen over 8 years ago

Status changed from 12 to Resolved
Assignee set to Andrew Schoen

https://github.com/ceph/ceph-build/pull/284

https://github.com/ceph/ceph-build/commit/983647beb7d4fb0b40072aa0cc60bdd210cd0979

Actions

Copy link

Updated by Andrew Schoen over 8 years ago

Status changed from Resolved to In Progress

The approach we took with two shell builders won't work because if the first fails, which is the make check, the second won't run.

Actions

Copy link

Updated by Andrew Schoen over 8 years ago

If we move the reboot to a postbuildscript we can report the correct status to github always and only reboot the node on failures.

https://github.com/ceph/ceph-build/pull/287

Actions

Copy link

Updated by Andrew Schoen over 8 years ago

Status changed from In Progress to Fix Under Review

Actions

Copy link

Updated by Andrew Schoen over 8 years ago

Status changed from Fix Under Review to Resolved

https://github.com/ceph/ceph-build/commit/dcb9adf981ae8f69c63b933a6aaaffadf1d557ab

We also had to change the number of executors on the node that will run this job to 1 so that the node won't reboot while another job is running on the same node. That was done here: https://jenkins.ceph.com/computer/centos7+158.69.77.220/configure

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » devops

Custom queries

Bug #14172

https://jenkins.ceph.com/job/ceph-pull-requests/ aka make check fixes

Workaround¶

Isssues¶

Updated by Loïc Dachary over 8 years ago

Updated by Loïc Dachary over 8 years ago

Updated by Loïc Dachary over 8 years ago

Updated by Loïc Dachary over 8 years ago

Updated by Andrew Schoen over 8 years ago

Updated by Andrew Schoen over 8 years ago

Updated by Andrew Schoen over 8 years ago

Updated by Andrew Schoen over 8 years ago

Updated by Andrew Schoen over 8 years ago