Bug #11798: upstart: configuration is too generous on restarts - devops - Ceph

Actions

Bug #11798

closed

upstart: configuration is too generous on restarts

Added by Greg Farnum almost 9 years ago. Updated over 8 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

-

Target version:

-

% Done:

0%

Source:

Development

Tags:

Backport:

hammer, firefly

Regression:

No

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

See https://bugzilla.redhat.com/show_bug.cgi?id=1210871 for the investigation that prompted this.

Our current upstart scripts are probably too generous about restarting processes. At the moment each daemon is configured to restart as long as it doesn't exceed 5 crashes in 30 seconds. The restart process on some of them can exceed 6 seconds (at least some of the time), and any of our daemons which are crashing that frequently are probably stuck on a disk state issue.

We need to run some tests to figure out more reasonable values and change them.

Related issues 2 (0 open — 2 closed)

Actions

#1

Updated by Sage Weil almost 9 years ago

how about 5 restarts in 10 minutes?

Actions

#2

Updated by Sage Weil almost 9 years ago

Status changed from New to Fix Under Review

https://github.com/ceph/ceph/pull/4828

Actions

#3

Updated by Sage Weil almost 9 years ago

Assignee set to Sage Weil

Actions

#4

Updated by Greg Farnum almost 9 years ago

Status changed from Fix Under Review to Resolved

Merged in commit:172d3ac8744c876a0f6ed99f4d63d95ea899cf85 we do 3 restarts in 30 minutes on OSD, Mon, MDS.

Actions

#5

Updated by Sage Weil over 8 years ago

https://github.com/ceph/ceph/pull/5930 (hammer backport)

Actions

#6

Updated by Loïc Dachary over 8 years ago

Status changed from Resolved to Pending Backport
Backport set to hammer

Actions

#7

Updated by Loïc Dachary over 8 years ago

Status changed from Pending Backport to Resolved

Actions

#8

Updated by Ken Dreyer over 8 years ago

Backport changed from hammer to hammer, firefly

We're planning to ship this fix downstream in the RHCS 1.2 series - we might as well get it upstream in Firefly too.

Actions

#9

Updated by Ken Dreyer over 8 years ago

Status changed from Resolved to Pending Backport

Actions

#10

Updated by Nathan Cutler over 8 years ago

Project changed from Ceph to devops

Actions

#11

Updated by Loïc Dachary over 8 years ago

Status changed from Pending Backport to Resolved

Actions

Also available in: Atom PDF