Project

General

Profile

Actions

Bug #17699

closed

teuthworker processes hung writing to cephfs, no evidence why

Added by Dan Mick over 7 years ago. Updated almost 7 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Several times on teuthology.front, we've seen things grind to a halt in the following scenario:

- teuthworker opens a logfile in /a/worker_logs for all its stderr
- teuthworker grabs a lock on the teuthology shared repo to update it
- bootstrap runs to update the sources and rebuild the venv
- pip runs to update a python package, and needs to retry, so writes to stderr and hangs

The one time I looked in depth at this failure (yesterday), I could find nothing in the kernel debug files that indicated a write was in progress, and the cluster was healthy, etc.

Probably enabling some kclient logging would be good, but I need a recommendation for what level. Teuthology sees a whole lot of activity.

Actions

Also available in: Atom PDF