Project

General

Profile

Actions

Bug #19402

closed

git.ceph.com instability

Added by David Galloway about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
Infrastructure Service
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

git.ceph.com has been failing to respond over HTTP a few times during early morning US hours the past few weeks. This may not matter all that much since it's just the web UI but we're unsure if it's also affecting jobs.

I've gathered some stats on what's happening on the machine during these times. Here's a 10 minute snapshot during an HTTP outage: https://paste.fedoraproject.org/paste/pCTM-~6yo4LP5yg3MgjLdl5M1UNdIGYhyRLivL9gydE=

Compare to quiet/stable time: https://paste.fedoraproject.org/paste/H6ptu0dxeEXjXpmppzPuE15M1UNdIGYhyRLivL9gydE=

My current hypothesis is the addition of jobs in OVH is too much for the host to handle on its existing storage. (http://tracker.ceph.com/issues/17415)

See http://tracker.ceph.com/projects/ceph-releases/wiki/Sepia/diff?utf8=%E2%9C%93&version=131&version_from=129&commit=View+differences for when OVH jobs were added.

Dan's going to do some testing to see if he can manually reproduce the system load and HTTP failures.

Actions

Also available in: Atom PDF