Project

General

Profile

Diagnosability

Summary

Make it easier to diagnose things like inconsistent PGs in a cluster. Ideas include, but are not limited to:

store state of scrub errors for later querying by CLI/HTTP
explain pg repair and osd repair
maybe parameterize pg repair and osd repair to address arbitrary choices for resolution
central audit log containing "admin operations that change the cluster" (much smaller than ceph.log, for manual audit)

Owners

  • Dan Mick (Inktank)

Interested Parties

  • Dan Mick (Inktank)
  • Danny Al-Gaaf (Deutsche Telekom AG)
  • Loic Dachary <>
  • Name

Current Status

Today you can find details of scrub errors and object names only in the central log; a query interface would be a start. Also, pg/osd repair's actions are...unclear, and perhaps suboptimal; they should be clearly documented as a start, and perhaps improved/parameterized (as automatic repair ideally needs administrator policy input).

Detailed Description

Work items

Coding tasks

  1. Task 1
  2. Task 2
  3. Task 3

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3