Actions
Bug #17699
closedteuthworker processes hung writing to cephfs, no evidence why
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
Several times on teuthology.front, we've seen things grind to a halt in the following scenario:
- teuthworker opens a logfile in /a/worker_logs for all its stderr
- teuthworker grabs a lock on the teuthology shared repo to update it
- bootstrap runs to update the sources and rebuild the venv
- pip runs to update a python package, and needs to retry, so writes to stderr and hangs
The one time I looked in depth at this failure (yesterday), I could find nothing in the kernel debug files that indicated a write was in progress, and the cluster was healthy, etc.
Probably enabling some kclient logging would be good, but I need a recommendation for what level. Teuthology sees a whole lot of activity.
Actions