Bug #14172
closed
https://jenkins.ceph.com/job/ceph-pull-requests/ aka make check fixes
Added by Loïc Dachary over 8 years ago.
Updated over 8 years ago.
Description
Workaround¶
run-make-check.sh ; sudo reboot to cleanup lingering processes. This should allow the bot to run for weeks before it requires a re-image for some reason.
Isssues¶
- running on unsupported operating systems (CentOS 6, precise and maybe others)
- leftovers from a previous test (which should be removed when a new slave is provisionned for each test)
- keep the last 300 jobs for forensic analysis (about one week worth)
- re-enable the jenkins job
- disable reporting to github pull requests so that the stability of the run can be verified without sending numerous false negative while doing so
- Status changed from New to 12
- Priority changed from Normal to Urgent
Setting to urgent because the absence of an automated make check has a noticeable daily impact on the work of Ceph developers
- Subject changed from https://jenkins.ceph.com/job/ceph-pull-requests/ fixes to https://jenkins.ceph.com/job/ceph-pull-requests/ aka make check fixes
<alfredodeza> loicd: CI is not meant to be openstack only, and it does involve a bit more work other than to just call an external API to terminate the instance
<alfredodeza> the current state of CI is that there is no way to be able to spin up/down nodes
<loicd> alfredodeza: it's using OpenStack but is not able to spin up/down nodes ? How did that happen ?
<loicd> isn't it what https://github.com/alfredodeza/mita/ is about ?
<alfredodeza> loicd: that is just one component of ci
<alfredodeza> which is meant to get nodes up and down
<alfredodeza> *but we are not there yet*
<alfredodeza> it needs to be worked on
<loicd> alfredodeza: in the meantime I believe it would be enough to just reboot after the test. There is a *very* small chance that the test will taint the file system (i.e. fill / or something hard to recover from). No more chance that any python test really. There however is a *high* chance that a process survives the test because of a bug and the reboot can take care of that. The better solution would be to re-image every time but that will allow the bot to resume service efficiently.
<loicd> alfredodeza: will jenkins automatically reconnect to the slave after a reboot ? or does it need some extra infrastructure work ?
<alfredodeza> automatic
<loicd> cool then :-)
- Description updated (diff)
- Status changed from 12 to Resolved
- Assignee set to Andrew Schoen
- Status changed from Resolved to In Progress
The approach we took with two shell builders won't work because if the first fails, which is the make check, the second won't run.
- Status changed from In Progress to Fix Under Review
- Status changed from Fix Under Review to Resolved
Also available in: Atom
PDF