Project

General

Profile

Actions

Feature #4875

closed

gather logs on hung tasks

Added by Greg Farnum almost 11 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:

Description

Right now, if a run hangs, we eventually erase all evidence of it. We should gather up any logs which might exist on the machines first!

Actions #1

Updated by Anonymous almost 11 years ago

  • Tracker changed from Bug to Feature
Actions #2

Updated by Greg Farnum over 10 years ago

This is causing me trouble again. Can we move it up the backlog, pretty please? :) I don't think a better-effort one should take too long.

Actions #3

Updated by Greg Farnum about 9 years ago

  • Priority changed from Normal to High

bump

Actions #4

Updated by Greg Farnum over 8 years ago

  • Priority changed from High to Urgent

bump

We had a long string of jobs get hung on MDS crash #12711, which we managed to diagnose, but http://pulpito.ceph.com/teuthology-2015-08-17_23:04:01-fs-master---basic-multi/1020414/ for instance also contains a monitor running out of memory and it would be really nice if we could examine that.

Actions #5

Updated by Greg Farnum over 8 years ago

Also in http://pulpito.ceph.com/teuthology-2015-08-17_23:04:01-fs-master---basic-multi/1020395/ it looks like a ceph-fuse process crashed from secondary evidence, but there are no traces left of it. :(

Actions #6

Updated by Zack Cerza over 8 years ago

An idea that was floated just now by Greg and Sam was:

Make teuthology-kill attempt to gather logs when passed a certain flag; teuthology-worker, when killing a job that's taking too long, could use that flag. We'd have to make sure that any exceptions raised would not take down the worker process.

Actions #7

Updated by Kyrylo Shatskyy almost 4 years ago

Gregory, is this ticket still actual or we can close it?

Actions #8

Updated by Josh Durgin about 3 years ago

  • Status changed from New to Resolved

teuthology-dispatcher does this: https://github.com/ceph/teuthology/pull/1546

Actions

Also available in: Atom PDF