give jobs requiring more nodes a higher priority?
The FS runs have several tests which require 5 machines instead of the default three. These get marked as hung a disproportionate amount of the time because the cluster never has enough machines free that they're allowed to take the locks: the 3-machine tests always get to lock instead.
Either we need a way to order tests which are waiting on locks, or we need to do something like (ugh) increase the reserved number of machines (which teuthology isn't allowed to lock) to a constant value greater than the number of machines used by our largest tests.
#2 Updated by Greg Farnum over 4 years ago
This struck with astounding clarity on http://pulpito.ceph.com/teuthology-2015-01-18_23:10:01-knfs-next-testing-basic-multi/
Six jobs waiting for long enough to get considered hung by the job emailer.