Feature #8012
closedpaddles: replace teuthology's lock server
0%
Updated by Ian Colle almost 10 years ago
- Translation missing: en.field_story_points set to 13.0
Updated by Zack Cerza almost 10 years ago
Paddles and teuthology will need some API additions:
[x] paddles: add API for querying single node by name
[x] paddles: add API for querying nodes by locked
, machine_type
, locked_by
[x] paddles: add API for querying nodes with multiple machine_types
[x] paddles: add API for locking single node passing: name
, locked_by
, description
[x] paddles: add API for unlocking single node passing: name
, locked_by
[x] paddles: update locked_since
when nodes are locked or unlocked
[x] paddles: add API for locking multiple nodes passing: count
, locked_by
, machine_type
, description
[ ] teuthology: refactor ssh-key-related functionality from teuthology.locker
subpackage into teuthology.lock
module
[ ] paddles: accept ssh_pub_key
args for all locking methods
[x] teuthology: rewrite parts of teuthology.lockstatus
and teuthology.lock
to use paddles
Updated by Zack Cerza almost 10 years ago
paddles work is happening here:
https://github.com/ceph/paddles/commits/locking
Updated by Zack Cerza almost 10 years ago
- Status changed from New to In Progress
Updated by Zack Cerza almost 10 years ago
query multiple machine types:
https://github.com/ceph/paddles/commit/da308b6a748dd769e169ca206840974f82a67cdd
query locked_by:
https://github.com/ceph/paddles/commit/ac5eb5193a99c2592548d1a333e352cdb403c256
Updated by Zack Cerza almost 10 years ago
I just noticed that the lock server itself is designed in such a way that it depends on being able to run /usr/bin/ssh-keyscan
within database transactions. This is pretty obviously uncool in my view.
It's going to take a bit to figure out how to sensibly implement updating ssh public keys now.
Updated by Zack Cerza almost 10 years ago
I think I'll make sure that the functions in teuthology.lock
handle getting ssh keys; they can then include them whichever requests they need to when they talk to paddles.
Updated by Zack Cerza almost 10 years ago
Updated by Zack Cerza almost 10 years ago
Updated by Zack Cerza almost 10 years ago
update Node.locked_since
when updating Node.locked
https://github.com/ceph/paddles/commit/a9276c08833aa49495d1d1749a8b22fec1308ba4
Updated by Ian Colle almost 10 years ago
- Target version changed from sprint6 to sprint7
Updated by Zack Cerza almost 10 years ago
add API for locking multiple nodes passing: count
, locked_by
, machine_type
, description
https://github.com/ceph/paddles/commit/a48ae55e3ea5cafaba68c03adba18ddffac12fad
Updated by Zack Cerza almost 10 years ago
I'm also now doing work in:
https://github.com/ceph/teuthology/commits/wip-locking
Updated by Zack Cerza almost 10 years ago
Rewrote teuthology.lock to use paddles:
https://github.com/ceph/teuthology/commit/f2e88bcc0537d74d0bfa26554bdc632d49656809
https://github.com/ceph/teuthology/commit/675a9e36aaeb4f5e62133cdef17e0d6f9daf1faa
https://github.com/ceph/teuthology/commit/9174aed781277a0dda9af6baf87958d11649d9a4
https://github.com/ceph/teuthology/commit/4bf4287e791ab11ab65019c7efd90c8dc18988b7
https://github.com/ceph/teuthology/commit/9ac7e26b6de9cddc68165660bf5d196ebb53ed91
Updated by Zack Cerza almost 10 years ago
We'll need #6978 to be finished before this can be merged.
Updated by Zack Cerza almost 10 years ago
One issue I'm not sure how to solve yet is that currently the lock server runs ssh-keyscan
during a lock-many
request; It's impossible to know which machines will need new keys on the client side.
But, why are we dancing around with ssh keys anyway?
Updated by Zack Cerza almost 10 years ago
I found the answer to my question; we depend on knowing keys for VMs and don't really have a separate codepath for bare metal; keeping the keys around holds enough merit. I rewrote the handling of the keys, though:
https://github.com/ceph/teuthology/commit/9264a740bd2de0c1cc65d4facb385ffa6990335a
Updated by Zack Cerza almost 10 years ago
paddles PR: https://github.com/ceph/paddles/pull/41
Updated by Zack Cerza almost 10 years ago
- Subject changed from paddles: mirror teuthology's lock server to paddles: replace teuthology's lock server
Updated by Ian Colle almost 10 years ago
- Target version changed from sprint7 to sprint8
Updated by Sage Weil almost 10 years ago
- Target version changed from sprint8 to sprint9
Updated by Zack Cerza almost 10 years ago
- Target version changed from sprint9 to sprint10
Updated by Ian Colle over 9 years ago
- Target version changed from sprint10 to sprint11
Updated by Zack Cerza over 9 years ago
Amazingly, I just rebased my wip branch after all the #6978 work, and it went without a hitch:
https://github.com/ceph/teuthology/tree/wip-locking
Updated by Zack Cerza over 9 years ago
For some reason I forgot to update the list in comment #4 last time I really worked on this.
Here is what is done:
[x] paddles: add API for querying single node by name
[x] paddles: add API for querying nodes by locked
, machine_type
, locked_by
[x] paddles: add API for querying nodes with multiple machine_types
[x] paddles: add API for locking single node passing: name
, locked_by
, description
[x] paddles: add API for unlocking single node passing: name
, locked_by
[x] paddles: update locked_since
when nodes are locked or unlocked
[x] paddles: add API for locking multiple nodes passing: count
, locked_by
, machine_type
, description
[x] teuthology: refactor ssh-key-related functionality from teuthology.locker
subpackage into teuthology.lock
module
[x] paddles: accept ssh_pub_key
args for all locking methods
[x] teuthology: rewrite parts of teuthology.lockstatus
and teuthology.lock
to use paddles
[x] paddles: accept multiple machine_type
values in /nodes/lock_many/
Here is what needs to be done:
[x] teuthology: ensure teuthology.task.internal
will work, and work efficiently
[x] teuthology: ensure all users of teuthology.lock.list_locks()
are using args to to narrow queries and thus avoid dumping the entire db on each call
(there are performance gains to be had over the previous implementation)
Updated by Ian Colle over 9 years ago
- Target version changed from sprint11 to sprint12
Updated by Zack Cerza over 9 years ago
Updated task.internal
and the list_locks()
callers. Also did:
[x] Speed up lock_many()
drastically
Next up:
[x] Speed up unlocking of multiple nodes similarly
Updated by Zack Cerza over 9 years ago
Added unlock_many
methods to paddles and teuthology. Speeds up unlocking a lot too - if not unlocking vpms
Updated by Zack Cerza over 9 years ago
I've done lots of manual testing and everything is looking pretty good. The queue is currently empty and there are only a few jobs running. Looks like I might deploy today. I'll send a warning email first.
Updated by Zack Cerza over 9 years ago
- Status changed from In Progress to Resolved
Deployment's done! (yesterday)