Project

General

Profile

Ceph CLI Experience

Summary

Enhance CLI features to make shell based troubleshooting easier.

Owners

Interested Parties

Current Status

Detailed Description

There are a number of things that could be done to improve the CLI experience for Ceph operators.

Parallel Distributed Shell Integration
The parallel distributed shell (pdsh) is a useful tool for managing a ceph cluster, it would be even more useful if gender and machine information could be derived from ceph monitor state. For example it would be awesome something like this was supported:

$ cephsh osd 1 uptime 

< runs uptime on osd.1 host >

$ cephsh rack irv-n1 status ceph-osd-all 

< check upstart status on all odds on all hosts in rack lax-n1 >

$ cephsh pg 5.123 uptime

< runs uptime on hosts with osds in placement group 5.123 >

This could be implemented as a pdsh plugin or a wrapper around pdsh that collects information from ceph monitors and passes hosts as a comma separated list along with the command: "pdsh -w <hosts> <command>"

Ability to set up/down/in/out or noout/nodown/noup/noin based on CRUSH hiearchy
Another feature that would be incredibly useful would be adding the ability to set up/down/in/out based on CRUSH hiearchy.

$ sudo ceph osd down rack lax-a1
$ sudo ceph osd out host cephstore1234
$ sudo ceph osd set noout rack lax-a1

This is usefull when you are performing maintenance operations on an entire node/rack/etc. The noout/noin/nodown/noup part would be nice when your dealing with a large cluster and where you don't want to stop those operations from taking place on the rest of your cluster.

Show running Ceph version in "ceph osd tree"

# id    weight  type name       up/down reweight
-1      879.4   pool default
-4      451.4           row lax-a
-3      117                     rack lax-a1
-2      7                               host cephstore1234
48      1                                       osd.0 up      1 0.67.4-1precise
65      1                                       osd.1 up      1 0.67.4-1precise
86      1                                       osd.2 up      1 0.67.4-1precise
116     1                                       osd.3 up      1 0.67.4-1precise
184     1                                       osd.4 up      1 0.67.4-1precise
711     1                                       osd.5 up      1 0.67.4-1precise
777     1                                       osd.6 up      1 0.67.4-1precise
-5      6                               host cephstore1235

Add drain action to OSD command
It would be really nice to add a drain command that slowly lowers the CRUSH weight of an OSD or hierarchy of OSDs until it reaches a weight of 0.

$ sudo ceph osd drain osd.1 0.1
$ sudo ceph osd drain cephstore1234

The drain command would lower the CRUSH weight for all members under that subtree by a default decrement or an decrement passed as a second argument. The cluster would wait until all backfills are complete before further decrementing, ad inifinum until weight is 0.

Work items

Coding tasks

  1. Task 1
  2. #6687
  3. #6506

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3