Project

General

Profile

Actions

Feature #8012

closed

paddles: replace teuthology's lock server

Added by Zack Cerza about 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:

Related issues 1 (0 open1 closed)

Blocked by teuthology - Feature #6978: teuthology: split out tasksResolvedZack Cerza12/10/2013

Actions
Actions #1

Updated by Ian Colle almost 10 years ago

  • Target version set to sprint6
Actions #2

Updated by Ian Colle almost 10 years ago

  • Assignee set to Zack Cerza
Actions #3

Updated by Ian Colle almost 10 years ago

  • Translation missing: en.field_story_points set to 13.0
Actions #4

Updated by Zack Cerza almost 10 years ago

Paddles and teuthology will need some API additions:
[x] paddles: add API for querying single node by name
[x] paddles: add API for querying nodes by locked, machine_type, locked_by
[x] paddles: add API for querying nodes with multiple machine_types
[x] paddles: add API for locking single node passing: name, locked_by, description
[x] paddles: add API for unlocking single node passing: name, locked_by
[x] paddles: update locked_since when nodes are locked or unlocked
[x] paddles: add API for locking multiple nodes passing: count, locked_by, machine_type, description
[ ] teuthology: refactor ssh-key-related functionality from teuthology.locker subpackage into teuthology.lock module
[ ] paddles: accept ssh_pub_key args for all locking methods
[x] teuthology: rewrite parts of teuthology.lockstatus and teuthology.lock to use paddles

Actions #5

Updated by Zack Cerza almost 10 years ago

Actions #6

Updated by Zack Cerza almost 10 years ago

  • Status changed from New to In Progress
Actions #8

Updated by Zack Cerza almost 10 years ago

I just noticed that the lock server itself is designed in such a way that it depends on being able to run /usr/bin/ssh-keyscan within database transactions. This is pretty obviously uncool in my view.

It's going to take a bit to figure out how to sensibly implement updating ssh public keys now.

Actions #9

Updated by Zack Cerza almost 10 years ago

I think I'll make sure that the functions in teuthology.lock handle getting ssh keys; they can then include them whichever requests they need to when they talk to paddles.

Actions #12

Updated by Zack Cerza almost 10 years ago

Actions #13

Updated by Ian Colle almost 10 years ago

  • Target version changed from sprint6 to sprint7
Actions #14

Updated by Zack Cerza almost 10 years ago

add API for locking multiple nodes passing: count, locked_by, machine_type, description
https://github.com/ceph/paddles/commit/a48ae55e3ea5cafaba68c03adba18ddffac12fad

Actions #17

Updated by Zack Cerza almost 10 years ago

We'll need #6978 to be finished before this can be merged.

Actions #18

Updated by Zack Cerza almost 10 years ago

One issue I'm not sure how to solve yet is that currently the lock server runs ssh-keyscan during a lock-many request; It's impossible to know which machines will need new keys on the client side.

But, why are we dancing around with ssh keys anyway?

Actions #19

Updated by Zack Cerza almost 10 years ago

I found the answer to my question; we depend on knowing keys for VMs and don't really have a separate codepath for bare metal; keeping the keys around holds enough merit. I rewrote the handling of the keys, though:
https://github.com/ceph/teuthology/commit/9264a740bd2de0c1cc65d4facb385ffa6990335a

Actions #21

Updated by Zack Cerza almost 10 years ago

  • Subject changed from paddles: mirror teuthology's lock server to paddles: replace teuthology's lock server
Actions #22

Updated by Ian Colle almost 10 years ago

  • Target version changed from sprint7 to sprint8
Actions #23

Updated by Sage Weil almost 10 years ago

  • Target version changed from sprint8 to sprint9
Actions #24

Updated by Zack Cerza almost 10 years ago

  • Target version changed from sprint9 to sprint10
Actions #25

Updated by Ian Colle over 9 years ago

  • Target version changed from sprint10 to sprint11
Actions #26

Updated by Zack Cerza over 9 years ago

Amazingly, I just rebased my wip branch after all the #6978 work, and it went without a hitch:
https://github.com/ceph/teuthology/tree/wip-locking

Actions #27

Updated by Zack Cerza over 9 years ago

For some reason I forgot to update the list in comment #4 last time I really worked on this.

Here is what is done:
[x] paddles: add API for querying single node by name
[x] paddles: add API for querying nodes by locked, machine_type, locked_by
[x] paddles: add API for querying nodes with multiple machine_types
[x] paddles: add API for locking single node passing: name, locked_by, description
[x] paddles: add API for unlocking single node passing: name, locked_by
[x] paddles: update locked_since when nodes are locked or unlocked
[x] paddles: add API for locking multiple nodes passing: count, locked_by, machine_type, description
[x] teuthology: refactor ssh-key-related functionality from teuthology.locker subpackage into teuthology.lock module
[x] paddles: accept ssh_pub_key args for all locking methods
[x] teuthology: rewrite parts of teuthology.lockstatus and teuthology.lock to use paddles
[x] paddles: accept multiple machine_type values in /nodes/lock_many/

Here is what needs to be done:
[x] teuthology: ensure teuthology.task.internal will work, and work efficiently
[x] teuthology: ensure all users of teuthology.lock.list_locks() are using args to to narrow queries and thus avoid dumping the entire db on each call

(there are performance gains to be had over the previous implementation)

Actions #28

Updated by Ian Colle over 9 years ago

  • Target version changed from sprint11 to sprint12
Actions #29

Updated by Zack Cerza over 9 years ago

Updated task.internal and the list_locks() callers. Also did:
[x] Speed up lock_many() drastically

Next up:
[x] Speed up unlocking of multiple nodes similarly

Actions #30

Updated by Zack Cerza over 9 years ago

Added unlock_many methods to paddles and teuthology. Speeds up unlocking a lot too - if not unlocking vpms

Actions #31

Updated by Zack Cerza over 9 years ago

I've done lots of manual testing and everything is looking pretty good. The queue is currently empty and there are only a few jobs running. Looks like I might deploy today. I'll send a warning email first.

Actions #32

Updated by Zack Cerza over 9 years ago

  • Status changed from In Progress to Resolved

Deployment's done! (yesterday)

Actions

Also available in: Atom PDF