Project

General

Profile

Bug #19402

Updated by Dan Mick about 7 years ago

git.ceph.com has been failing to respond over HTTP a few times during early morning US hours the past few weeks.    This may not matter all that much since it's just the web UI but we're unsure if it's also affecting jobs. 

 I've gathered some stats on what's happening on the machine during these times.    Here's a 10 minute snapshot during an HTTP outage:    https://paste.fedoraproject.org/paste/pCTM-~6yo4LP5yg3MgjLdl5M1UNdIGYhyRLivL9gydE= 

 Compare to quiet/stable time: https://paste.fedoraproject.org/paste/H6ptu0dxeEXjXpmppzPuE15M1UNdIGYhyRLivL9gydE= 

 My current hypothesis is the addition of jobs in OVH is too much for the host to handle on its existing storage.    (http://tracker.ceph.com/issues/17415) 

 See http://tracker.ceph.com/projects/ceph-releases/wiki/Sepia/diff?utf8=%E2%9C%93&version=131&version_from=129&commit=View+differences for when OVH jobs were added. 

 Dan's going to do some testing to see if he can manually reproduce the system load and HTTP failures.

Back