Calamariapihardwarestorage » History » Version 2
Christina Meno, 07/01/2015 06:09 PM
1 | 1 | Christina Meno | h1. Calamari/api/hardware/storage |
---|---|---|---|
2 | |||
3 | |||
4 | Summary |
||
5 | In a large distributed system things are always happening. We care more about the causes and the implications of these events than the constant stream of them. |
||
6 | This is a design to help connect the events happening to hardware and the Ceph constructs they affect. |
||
7 | |||
8 | Owners |
||
9 | Gregory Meno(Red Hat) Pacific |
||
10 | Joe Handzik(HP) Central |
||
11 | |||
12 | Interested Parties |
||
13 | If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here. |
||
14 | Name (Affiliation) |
||
15 | Name (Affiliation) |
||
16 | Name |
||
17 | |||
18 | Current Status |
||
19 | planning and implementation |
||
20 | |||
21 | Detailed Description |
||
22 | 2 | Christina Meno | First I’d like to make an exciting announcement: |
23 | |||
24 | packages for calamari and romana are available on download.ceph.com |
||
25 | |||
26 | What packages? |
||
27 | calamari-server, calamari-clients(romana), and diamond |
||
28 | |||
29 | Ok I’ve got these packages what do I do with them? |
||
30 | http://calamari.readthedocs.org/en/latest/operations/server_install.html |
||
31 | |||
32 | What is the plan going forward? |
||
33 | get nightly test suites |
||
34 | public facing build infrastructure |
||
35 | |||
36 | What distributions are supported? |
||
37 | centos |
||
38 | ubuntu |
||
39 | |||
40 | When will packages distribution XYZ be provided? |
||
41 | when volunteers emerge to lead the effort |
||
42 | fedora 21+ planned |
||
43 | |||
44 | |||
45 | Now let’s talk about hardware and ceph. |
||
46 | |||
47 | In a large distributed system things are always happening. We care more about the causes and the implications of these events than the constant stream of them. |
||
48 | [JH] - Cause is a key, definitely. We may want to consider how best to store a stream of events though, for post-event trend analysis. At large scales, a bad batch of drives can be identified early via IO trends, drive health, and failure identification (for example). Definitely not our first priority here though, I agree. |
||
49 | This is a design to help connect the events happening to hardware and the Ceph constructs they affect. |
||
50 | |||
51 | 1 | Christina Meno | OSD.128 is down? That's in host foo_bar_5.... but what drive is that? Is the failure software or hardware? What do I replace it with? How long has it been failing? |
52 | These questions probably sound familiar if you are an operator of a Ceph cluster. We want to improve the facility to answer these questions by implementing a new storage hardware API. |
||
53 | OSDs have storage hardware |
||
54 | Storage hardware has events |
||
55 | Events can inform proper corrective action. |
||
56 | |||
57 | 2 | Christina Meno | example: |
58 | |||
59 | api/v2/hardware/storage |
||
60 | # provides a list of all known storage |
||
61 | |||
62 | Thoughts: |
||
63 | This data should be paginated |
||
64 | |||
65 | |||
66 | Questions: |
||
67 | What are the ways we’d like to filter this data? by host, by manufacturer, by service, by has_error |
||
68 | by_service filtering would be an indirect way to learn about all the hardware that backs a pool. Should we just filter by_pool? |
||
69 | How do we apply SES commands to this endpoint? |
||
70 | Not just SES commands, but other CLI commands too. Like I mentioned in my email, I’d like some direction from users here if we can get it, but it’s not essential for the first wave of things I’d expect us to implement. |
||
71 | |||
72 | api/v2/hardware/storage/SCSI_WWID |
||
73 | host # foo_bar_5 |
||
74 | drive # sdb |
||
75 | type # something that makes sense to a human |
||
76 | capacity |
||
77 | usage |
||
78 | manufacturer |
||
79 | FW version |
||
80 | serial # |
||
81 | services |
||
82 | [osd.128, mon1] |
||
83 | last_event |
||
84 | status |
||
85 | full_status |
||
86 | timestamp |
||
87 | event_message # human readable |
||
88 | error_rate ? |
||
89 | |||
90 | |||
91 | |||
92 | |||
93 | Questions: |
||
94 | Calamari currently stores all state in memory and serializes it to persistent storage for crash-recovery. Will raw SMART data for most recent status on largest clusters overwhelm a typical calamari installation? |
||
95 | Fair question, I’m not sure. Probably worth pushing this down into a log file of some sort regardless. |
||
96 | Would error-rate be an effective way to work around this limitation? |
||
97 | I don’t know that we should squash SMART into a single number, I think John mentioned that in his feedback and I tend to agree. I’d prefer the workaround I listed under #1. |
||
98 | How do we provide meaningful base-line data for identifying outliers? Is that part of this API? |
||
99 | At first, no. It might be interesting to consider a world where all Ceph users can contribute to a database of sorts, where we globalize trend analysis of drive failures. Pie-in-the-sky, yes. But it’d make for a heck of a demo :) |
||
100 | Is there any additional metadata that should be collected / presented? |
||
101 | Is SCSI_WWID the correct persistent identifier for storage? |
||
102 | |||
103 | Resources: |
||
104 | |||
105 | https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/persistent_naming.html |
||
106 | |||
107 | Relevant thoughts on sysfs: |
||
108 | |||
109 | https://www.kernel.org/pub/linux/kernel/people/mochel/doc/papers/ols-2005/mochel.pdf |
||
110 | |||
111 | Relevant components of /sys: |
||
112 | |||
113 | /sys/block/sd<letter> |
||
114 | /sys/class/block |
||
115 | /sys/block/sd<letter>/device (symlink to actual device) |
||
116 | /sys/class/enclosure |
||
117 | |||
118 | http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/drivers/misc/enclosure.c |
||
119 | http://lxr.free-electrons.com/source/include/scsi/scsi_device.h |
||
120 | (vpd_pg83 stuff is relevant) |
||
121 | http://lxr.free-electrons.com/source/drivers/scsi/scsi.c |
||
122 | |||
123 | Tools: |
||
124 | |||
125 | o Sg3utils |
||
126 | o http://linux.die.net/man/1/sgpio |
||
127 | o Sg_ses: http://sg.danny.cz/sg/sg_ses.html |
||
128 | o Sg_vpd: http://linux.die.net/man/8/sg_vpd |
||
129 | |||
130 | |||
131 | 1 | Christina Meno | |
132 | Work items |
||
133 | This section should contain a list of work tasks created by this blueprint. Please include engineering tasks as well as related build/release and documentation work. If this blueprint requires cleanup of deprecated features, please list those tasks as well. |
||
134 | |||
135 | Coding tasks |
||
136 | * https://github.com/ceph/ceph/pull/4699 |
||
137 | * pull OSD hardware info in to calamari |
||
138 | * write checks for the storage hardware |
||
139 | * Task 3 |
||
140 | |||
141 | Build / release tasks |
||
142 | Task 1 |
||
143 | Task 2 |
||
144 | Task 3 |
||
145 | |||
146 | Documentation tasks |
||
147 | Task 1 |
||
148 | Task 2 |
||
149 | Task 3 |
||
150 | |||
151 | Deprecation tasks |
||
152 | Task 1 |
||
153 | Task 2 |
||
154 | Task 3 |