Ceph CLI Experience » History » Version 1
Jessica Mack, 06/22/2015 02:11 AM
1 | 1 | Jessica Mack | h1. Ceph CLI Experience |
---|---|---|---|
2 | |||
3 | h3. Summary |
||
4 | |||
5 | Enhance CLI features to make shell based troubleshooting easier. |
||
6 | |||
7 | h3. Owners |
||
8 | |||
9 | * Kyle Bader (kyle.bader@dreamhost.com) |
||
10 | |||
11 | h3. Interested Parties |
||
12 | |||
13 | * Mike Dawson <mike.dawson@cloudapt.com> |
||
14 | * Joao Eduardo Luis <joao.luis@inktank.com> |
||
15 | * Name (Affiliation) |
||
16 | |||
17 | h3. Current Status |
||
18 | |||
19 | h3. Detailed Description |
||
20 | |||
21 | There are a number of things that could be done to improve the CLI experience for Ceph operators. |
||
22 | |||
23 | *Parallel Distributed Shell Integration* |
||
24 | The parallel distributed shell (pdsh) is a useful tool for managing a ceph cluster, it would be even more useful if gender and machine information could be derived from ceph monitor state. For example it would be awesome something like this was supported: |
||
25 | <pre> |
||
26 | $ cephsh osd 1 uptime |
||
27 | |||
28 | < runs uptime on osd.1 host > |
||
29 | |||
30 | $ cephsh rack irv-n1 status ceph-osd-all |
||
31 | |||
32 | < check upstart status on all odds on all hosts in rack lax-n1 > |
||
33 | |||
34 | $ cephsh pg 5.123 uptime |
||
35 | |||
36 | < runs uptime on hosts with osds in placement group 5.123 > |
||
37 | </pre> |
||
38 | |||
39 | This could be implemented as a pdsh plugin or a wrapper around pdsh that collects information from ceph monitors and passes hosts as a comma separated list along with the command: "pdsh -w <hosts> <command>" |
||
40 | |||
41 | *Ability to set up/down/in/out or noout/nodown/noup/noin based on CRUSH hiearchy* |
||
42 | Another feature that would be incredibly useful would be adding the ability to set up/down/in/out based on CRUSH hiearchy. |
||
43 | <pre> |
||
44 | $ sudo ceph osd down rack lax-a1 |
||
45 | $ sudo ceph osd out host cephstore1234 |
||
46 | $ sudo ceph osd set noout rack lax-a1 |
||
47 | </pre> |
||
48 | |||
49 | This is usefull when you are performing maintenance operations on an entire node/rack/etc. The noout/noin/nodown/noup part would be nice when your dealing with a large cluster and where you don't want to stop those operations from taking place on the rest of your cluster. |
||
50 | |||
51 | *Show running Ceph version in "ceph osd tree"* |
||
52 | <pre> |
||
53 | # id weight type name up/down reweight |
||
54 | -1 879.4 pool default |
||
55 | -4 451.4 row lax-a |
||
56 | -3 117 rack lax-a1 |
||
57 | -2 7 host cephstore1234 |
||
58 | 48 1 osd.0 up 1 0.67.4-1precise |
||
59 | 65 1 osd.1 up 1 0.67.4-1precise |
||
60 | 86 1 osd.2 up 1 0.67.4-1precise |
||
61 | 116 1 osd.3 up 1 0.67.4-1precise |
||
62 | 184 1 osd.4 up 1 0.67.4-1precise |
||
63 | 711 1 osd.5 up 1 0.67.4-1precise |
||
64 | 777 1 osd.6 up 1 0.67.4-1precise |
||
65 | -5 6 host cephstore1235 |
||
66 | </pre> |
||
67 | |||
68 | *Add drain action to OSD command* |
||
69 | It would be really nice to add a drain command that slowly lowers the CRUSH weight of an OSD or hierarchy of OSDs until it reaches a weight of 0. |
||
70 | <pre> |
||
71 | $ sudo ceph osd drain osd.1 0.1 |
||
72 | $ sudo ceph osd drain cephstore1234 |
||
73 | </pre> |
||
74 | |||
75 | The drain command would lower the CRUSH weight for all members under that subtree by a default decrement or an decrement passed as a second argument. The cluster would wait until all backfills are complete before further decrementing, ad inifinum until weight is 0. |
||
76 | |||
77 | h3. Work items |
||
78 | |||
79 | |||
80 | h4. Coding tasks |
||
81 | |||
82 | # Task 1 |
||
83 | # #6687 |
||
84 | # #6506 |
||
85 | |||
86 | h4. Build / release tasks |
||
87 | |||
88 | # Task 1 |
||
89 | # Task 2 |
||
90 | # Task 3 |
||
91 | |||
92 | h4. Documentation tasks |
||
93 | |||
94 | # Task 1 |
||
95 | # Task 2 |
||
96 | # Task 3 |
||
97 | |||
98 | h4. Deprecation tasks |
||
99 | |||
100 | # Task 1 |
||
101 | # Task 2 |
||
102 | # Task 3 |