Project

General

Profile

Actions

Calamari/api/hardware/storage

Summary
In a large distributed system things are always happening. We care more about the causes and the implications of these events than the constant stream of them.
This is a design to help connect the events happening to hardware and the Ceph constructs they affect.

Owners
Gregory Meno(Red Hat) Pacific
Joe Handzik(HP) Central

Interested Parties
If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here.
Name (Affiliation)
Name (Affiliation)
Name

Current Status
planning and implementation

Detailed Description
First I’d like to make an exciting announcement:

packages for calamari and romana are available on download.ceph.com

What packages?
calamari-server, calamari-clients(romana), and diamond

Ok I’ve got these packages what do I do with them?
http://calamari.readthedocs.org/en/latest/operations/server_install.html

What is the plan going forward?
get nightly test suites
public facing build infrastructure

What distributions are supported?
centos
ubuntu

When will packages distribution XYZ be provided?
when volunteers emerge to lead the effort
fedora 21+ planned

Now let’s talk about hardware and ceph.

In a large distributed system things are always happening. We care more about the causes and the implications of these events than the constant stream of them.
[JH] - Cause is a key, definitely. We may want to consider how best to store a stream of events though, for post-event trend analysis. At large scales, a bad batch of drives can be identified early via IO trends, drive health, and failure identification (for example). Definitely not our first priority here though, I agree.
This is a design to help connect the events happening to hardware and the Ceph constructs they affect.

OSD.128 is down? That's in host foo_bar_5.... but what drive is that? Is the failure software or hardware? What do I replace it with? How long has it been failing?
These questions probably sound familiar if you are an operator of a Ceph cluster. We want to improve the facility to answer these questions by implementing a new storage hardware API.
OSDs have storage hardware
Storage hardware has events
Events can inform proper corrective action.

example:

api/v2/hardware/storage
  1. provides a list of all known storage

Thoughts:
This data should be paginated

Questions:
What are the ways we’d like to filter this data? by host, by manufacturer, by service, by has_error
by_service filtering would be an indirect way to learn about all the hardware that backs a pool. Should we just filter by_pool?
How do we apply SES commands to this endpoint?
Not just SES commands, but other CLI commands too. Like I mentioned in my email, I’d like some direction from users here if we can get it, but it’s not essential for the first wave of things I’d expect us to implement.

api/v2/hardware/storage/SCSI_WWID
host # foo_bar_5
drive # sdb
type # something that makes sense to a human
capacity
usage
manufacturer
FW version
serial #
services
[osd.128, mon1]
last_event
status
full_status
timestamp
event_message # human readable
error_rate ?

Questions:
Calamari currently stores all state in memory and serializes it to persistent storage for crash-recovery. Will raw SMART data for most recent status on largest clusters overwhelm a typical calamari installation?
Fair question, I’m not sure. Probably worth pushing this down into a log file of some sort regardless.
Would error-rate be an effective way to work around this limitation?
I don’t know that we should squash SMART into a single number, I think John mentioned that in his feedback and I tend to agree. I’d prefer the workaround I listed under #1.
How do we provide meaningful base-line data for identifying outliers? Is that part of this API?
At first, no. It might be interesting to consider a world where all Ceph users can contribute to a database of sorts, where we globalize trend analysis of drive failures. Pie-in-the-sky, yes. But it’d make for a heck of a demo :)
Is there any additional metadata that should be collected / presented?
Is SCSI_WWID the correct persistent identifier for storage?

Resources:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/persistent_naming.html

Relevant thoughts on sysfs:

https://www.kernel.org/pub/linux/kernel/people/mochel/doc/papers/ols-2005/mochel.pdf

Relevant components of /sys:

/sys/block/sd<letter>
/sys/class/block
/sys/block/sd<letter>/device (symlink to actual device)
/sys/class/enclosure

http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/drivers/misc/enclosure.c
http://lxr.free-electrons.com/source/include/scsi/scsi_device.h
(vpd_pg83 stuff is relevant)
http://lxr.free-electrons.com/source/drivers/scsi/scsi.c

Tools:

o Sg3utils
o http://linux.die.net/man/1/sgpio
o Sg_ses: http://sg.danny.cz/sg/sg_ses.html
o Sg_vpd: http://linux.die.net/man/8/sg_vpd

Work items
This section should contain a list of work tasks created by this blueprint. Please include engineering tasks as well as related build/release and documentation work. If this blueprint requires cleanup of deprecated features, please list those tasks as well.

Coding tasks

Build / release tasks
Task 1
Task 2
Task 3

Documentation tasks
Task 1
Task 2
Task 3

Deprecation tasks
Task 1
Task 2
Task 3

Updated by Christina Meno almost 9 years ago · 2 revisions