Project

General

Profile

Ceph CLI Experience » History » Version 1

Jessica Mack, 06/22/2015 02:11 AM

1 1 Jessica Mack
h1. Ceph CLI Experience
2
3
h3. Summary
4
5
Enhance CLI features to make shell based troubleshooting easier.
6
7
h3. Owners
8
9
* Kyle Bader (kyle.bader@dreamhost.com)
10
11
h3. Interested Parties
12
13
* Mike Dawson <mike.dawson@cloudapt.com>
14
* Joao Eduardo Luis <joao.luis@inktank.com>
15
* Name (Affiliation)
16
17
h3. Current Status
18
 
19
h3. Detailed Description
20
 
21
There are a number of things that could be done to improve the CLI experience for Ceph operators. 
22
 
23
*Parallel Distributed Shell Integration*
24
The parallel distributed shell (pdsh) is a useful tool for managing a ceph cluster, it would be even more useful if gender and machine information could be derived from ceph monitor state.  For example it would be awesome something like this was supported:
25
<pre>
26
$ cephsh osd 1 uptime 
27
28
< runs uptime on osd.1 host >
29
30
$ cephsh rack irv-n1 status ceph-osd-all 
31
32
< check upstart status on all odds on all hosts in rack lax-n1 >
33
34
$ cephsh pg 5.123 uptime
35
36
< runs uptime on hosts with osds in placement group 5.123 >
37
</pre>
38
39
This could be implemented as a pdsh plugin or a wrapper around pdsh that collects information from ceph monitors and passes hosts as a comma separated list along with the command:  "pdsh -w <hosts> <command>"
40
 
41
*Ability to set up/down/in/out or noout/nodown/noup/noin based on CRUSH hiearchy*
42
Another feature that would be incredibly useful would be adding the ability to set up/down/in/out based on CRUSH hiearchy. 
43
<pre>
44
$ sudo ceph osd down rack lax-a1
45
$ sudo ceph osd out host cephstore1234
46
$ sudo ceph osd set noout rack lax-a1
47
</pre>
48
49
This is usefull when you are performing maintenance operations on an entire node/rack/etc. The noout/noin/nodown/noup part would be nice when your dealing with a large cluster and where you don't want to stop those operations from taking place on the rest of your cluster.
50
 
51
*Show running Ceph version in "ceph osd tree"*
52
<pre>
53
# id    weight  type name       up/down reweight
54
-1      879.4   pool default
55
-4      451.4           row lax-a
56
-3      117                     rack lax-a1
57
-2      7                               host cephstore1234
58
48      1                                       osd.0 up      1 0.67.4-1precise
59
65      1                                       osd.1 up      1 0.67.4-1precise
60
86      1                                       osd.2 up      1 0.67.4-1precise
61
116     1                                       osd.3 up      1 0.67.4-1precise
62
184     1                                       osd.4 up      1 0.67.4-1precise
63
711     1                                       osd.5 up      1 0.67.4-1precise
64
777     1                                       osd.6 up      1 0.67.4-1precise
65
-5      6                               host cephstore1235
66
</pre>
67
 
68
*Add drain action to OSD command*
69
It would be really nice to add a drain command that slowly lowers the CRUSH weight of an OSD or hierarchy of OSDs until it reaches a weight of 0.
70
<pre>
71
$ sudo ceph osd drain osd.1 0.1
72
$ sudo ceph osd drain cephstore1234
73
</pre>
74
75
The drain command would lower the CRUSH weight for all members under that subtree by a default decrement or an decrement passed as a second argument. The cluster would wait until all backfills are complete before further decrementing, ad inifinum until weight is 0.
76
 
77
h3. Work items
78
 
79
 
80
h4. Coding tasks
81
82
# Task 1
83
# #6687
84
# #6506
85
86
h4. Build / release tasks
87
88
# Task 1
89
# Task 2
90
# Task 3
91
92
h4. Documentation tasks
93
94
# Task 1
95
# Task 2
96
# Task 3
97
98
h4. Deprecation tasks
99
100
# Task 1
101
# Task 2
102
# Task 3