Project

General

Profile

Actions

Documentation #2274

closed

Basic Availability Model

Added by Anonymous about 12 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
common
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

(1) Construct a continuous-time markov availability model for a basic cluster (3 mons, 4 osds, 2 copy)
(Petri nets are actually better suited, but few people understand them and tools are harder to find)
(2) plug in standard FIT rates for nodes, controllers, disks, NICs, switches, fans, power-supplies
(3) plug in measured reboot and recovery times
(4) use embedded linux s/w FIT rates until we have data from nightlies and long running clusters
(5) estimate percentage of coupled software failures based known anecdotes
(6) publish the model and open it for critique by the community
(7) maintain and publish internal and field failure rate data

Actions

Also available in: Atom PDF