Bug #17699
closedteuthworker processes hung writing to cephfs, no evidence why
0%
Description
Several times on teuthology.front, we've seen things grind to a halt in the following scenario:
- teuthworker opens a logfile in /a/worker_logs for all its stderr
- teuthworker grabs a lock on the teuthology shared repo to update it
- bootstrap runs to update the sources and rebuild the venv
- pip runs to update a python package, and needs to retry, so writes to stderr and hangs
The one time I looked in depth at this failure (yesterday), I could find nothing in the kernel debug files that indicated a write was in progress, and the cluster was healthy, etc.
Probably enabling some kclient logging would be good, but I need a recommendation for what level. Teuthology sees a whole lot of activity.
Updated by Dan Mick over 7 years ago
Actually¸ pondering this some more, I'm not 100% sure that the write is actually blocking at cephfs; pip will have a pipe to a pipe to a pipe to the process trying to do the write. Hmm.
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to sepia
- Category deleted (
26)
Dan, Zheng, did you ever get together on any of this? I suspect this was about the nasty behavior on network drops and reconnects, which I think changed somewhat but I forget how...
Updated by Dan Mick almost 7 years ago
No. We still see periodic failures of the same sort, but it's not yet clear exactly what is blocking the writes.
Updated by David Galloway almost 7 years ago
- Priority changed from Normal to High
Updated by David Galloway over 6 years ago
- Status changed from New to Rejected