Project

General

Profile

Actions

Feature #1885

closed

identify top 10 expected failures and process to diagnose

Added by Sage Weil over 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

- peering failures
- unfound objects

Actions #1

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position set to 4
Actions #2

Updated by Sage Weil over 12 years ago

  • Assignee set to Anonymous
Actions #3

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position deleted (22)
  • Translation missing: en.field_position set to 10
Actions #4

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.41 to v0.42
  • Translation missing: en.field_position deleted (17)
  • Translation missing: en.field_position set to 1
Actions #5

Updated by Anonymous about 12 years ago

OSD:
  • cascading failures
  • single OSD failure
  • failure to complete peering/recovery
  • unfound objects after recovery
  • full
  • slow
  • fails to respond to some request
Monitors:
  • failure
RGW:
  • failure
Load Balancer:
  • stops forwarding requests
Actions #6

Updated by Anonymous about 12 years ago

Additional issues from Carl's list:
  • RGW request timeouts
  • OSD file system timeouts
  • OSD that is "down" but still "in"
  • degraded placement groups
Actions #7

Updated by Greg Farnum about 12 years ago

Mark Kampe wrote:

Additional issues from Carl's list:
  • RGW request timeouts

That's a symptom, not a cause...

  • OSD file system timeouts

What timeouts? We have a few that cause suicides but I suspect he just means OSDs being slow in the filesystem.

  • OSD that is "down" but still "in"
  • degraded placement groups

I'm not sure what either of these are about. Both are revealed with "ceph -s" (more detail under "ceph osd dump" and "ceph pg dump"), and neither are problems in and of themselves.

Actions #8

Updated by Sage Weil about 12 years ago

  • Status changed from New to Resolved
  • Translation missing: en.field_position deleted (16)
  • Translation missing: en.field_position set to 16
Actions

Also available in: Atom PDF