Project

General

Profile

Calamariapihardwarestorage » History » Version 2

Christina Meno, 07/01/2015 06:09 PM

1 1 Christina Meno
h1. Calamari/api/hardware/storage
2
3
4
Summary
5
In a large distributed system things are always happening. We care more about the causes and the implications of these events than the constant stream of them.
6
This is a design to help connect the events happening to hardware and the Ceph constructs they affect.
7
8
Owners
9
Gregory Meno(Red Hat) Pacific
10
Joe Handzik(HP) Central
11
12
Interested Parties
13
If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here.
14
Name (Affiliation)
15
Name (Affiliation)
16
Name
17
18
Current Status
19
planning and implementation
20
21
Detailed Description
22 2 Christina Meno
First I’d like to make an exciting announcement:
23
24
packages for calamari and romana are available on download.ceph.com
25
26
What packages?
27
calamari-server, calamari-clients(romana), and diamond
28
29
Ok I’ve got these packages what do I do with them?
30
http://calamari.readthedocs.org/en/latest/operations/server_install.html
31
32
What is the plan going forward?
33
get nightly test suites
34
public facing build infrastructure
35
36
What distributions are supported?
37
centos
38
ubuntu
39
40
When will packages distribution XYZ be provided?
41
when volunteers emerge to lead the effort
42
fedora 21+ planned
43
44
45
Now let’s talk about hardware and ceph.
46
47
In a large distributed system things are always happening. We care more about the causes and the implications of these events than the constant stream of them.
48
[JH] - Cause is a key, definitely. We may want to consider how best to store a stream of events though, for post-event trend analysis. At large scales, a bad batch of drives can be identified early via IO trends, drive health, and failure identification (for example). Definitely not our first priority here though, I agree.
49
This is a design to help connect the events happening to hardware and the Ceph constructs they affect.
50
51 1 Christina Meno
OSD.128 is down? That's in host foo_bar_5.... but what drive is that? Is the failure software or hardware? What do I replace it with? How long has it been failing?
52
These questions probably sound familiar if you are an operator of a Ceph cluster. We want to improve the facility to answer these questions by implementing a new storage hardware API.
53
OSDs have storage hardware
54
Storage hardware has events
55
Events can inform proper corrective action.
56
57 2 Christina Meno
example:
58
59
api/v2/hardware/storage
60
# provides a list of all known storage
61
62
Thoughts:
63
This data should be paginated
64
65
66
Questions:
67
What are the ways we’d like to filter this data? by host, by manufacturer, by service, by has_error
68
by_service filtering would be an indirect way to learn about all the hardware that backs a pool. Should we just filter by_pool?
69
How do we apply SES commands to this endpoint?
70
Not just SES commands, but other CLI commands too. Like I mentioned in my email, I’d like some direction from users here if we can get it, but it’s not essential for the first wave of things I’d expect us to implement.
71
72
api/v2/hardware/storage/SCSI_WWID
73
host # foo_bar_5
74
drive # sdb
75
type # something that makes sense to a human
76
capacity
77
usage
78
manufacturer
79
FW version
80
serial #
81
services
82
[osd.128, mon1]
83
last_event
84
status
85
full_status
86
timestamp
87
event_message # human readable
88
error_rate ?
89
90
91
92
93
Questions:
94
Calamari currently stores all state in memory and serializes it to persistent storage for crash-recovery. Will raw SMART data for most recent status on largest clusters overwhelm a typical calamari installation?
95
Fair question, I’m not sure. Probably worth pushing this down into a log file of some sort regardless.
96
Would error-rate be an effective way to work around this limitation?
97
I don’t know that we should squash SMART into a single number, I think John mentioned that in his feedback and I tend to agree. I’d prefer the workaround I listed under #1.
98
How do we provide meaningful base-line data for identifying outliers? Is that part of this API?
99
At first, no. It might be interesting to consider a world where all Ceph users can contribute to a database of sorts, where we globalize trend analysis of drive failures. Pie-in-the-sky, yes. But it’d make for a heck of a demo :)
100
Is there any additional metadata that should be collected / presented?
101
Is SCSI_WWID the correct persistent identifier for storage?
102
103
Resources:
104
105
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/persistent_naming.html
106
107
Relevant thoughts on sysfs:
108
 
109
https://www.kernel.org/pub/linux/kernel/people/mochel/doc/papers/ols-2005/mochel.pdf
110
 
111
Relevant components of /sys:
112
 
113
/sys/block/sd<letter>
114
/sys/class/block
115
/sys/block/sd<letter>/device (symlink to actual device)
116
/sys/class/enclosure
117
118
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/drivers/misc/enclosure.c
119
http://lxr.free-electrons.com/source/include/scsi/scsi_device.h
120
(vpd_pg83 stuff is relevant)
121
http://lxr.free-electrons.com/source/drivers/scsi/scsi.c
122
123
Tools:
124
125
o   Sg3utils
126
o   http://linux.die.net/man/1/sgpio
127
o   Sg_ses: http://sg.danny.cz/sg/sg_ses.html
128
o   Sg_vpd: http://linux.die.net/man/8/sg_vpd
129
130
131 1 Christina Meno
132
Work items
133
This section should contain a list of work tasks created by this blueprint.  Please include engineering tasks as well as related build/release and documentation work.  If this blueprint requires cleanup of deprecated features, please list those tasks as well.
134
135
Coding tasks
136
* https://github.com/ceph/ceph/pull/4699
137
* pull OSD hardware info in to calamari
138
* write checks for the storage hardware
139
* Task 3
140
141
Build / release tasks
142
Task 1
143
Task 2
144
Task 3
145
146
Documentation tasks
147
Task 1
148
Task 2
149
Task 3
150
151
Deprecation tasks
152
Task 1
153
Task 2
154
Task 3