Bug #58241: Systemd unit file TimeoutStartSec of 120 seconds is too low - Orchestrator - Ceph

Actions

Copy link

Bug #58241

closed

Systemd unit file TimeoutStartSec of 120 seconds is too low

Added by Voja Molani over 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

Redouane Kachach Elhichou

Category:

cephadm

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

Ceph - v17.2.5

ceph-qa-suite:

Pull request ID:

49728

Crash signature (v1):

Crash signature (v2):

Description

Currently the systemd unit file used for all cephadm-managed containers sets TimeoutStartSec=120 which is too little.
In a case of a busy host restart and when all the containers are being started at the same time the start takes for example 140 seconds (dual E5-2630 v3, 64 GB RAM, slow system disks). It should be increased radically, perhaps to 180-200 seconds.

120 seconds is not enough for a host with all services, including RGW, and 12 OSD when using slow system disks.

What's more, if a container exceeds the 120 seconds start time and systemd terminates it (for trying to be started again later) then cephadm will not detect those containers as running even if a later restart by systemd succeeded in starting the container within the 120 second time limit (separate issue will be created).

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Redouane Kachach Elhichou over 1 year ago

Assignee set to Redouane Kachach Elhichou

Actions

Copy link

Updated by Redouane Kachach Elhichou over 1 year ago

Status changed from New to In Progress

Actions

Copy link

Updated by Redouane Kachach Elhichou over 1 year ago

Status changed from In Progress to Fix Under Review
Pull request ID set to 49728

Actions

Copy link

Updated by Redouane Kachach Elhichou over 1 year ago

Related to Bug #58242: cephadm doesn't communicate with containers that failed initial start but were successfully restarted later by systemd added

Actions

Copy link

Updated by Adam King about 1 year ago

Status changed from Fix Under Review to Resolved

since this shares a PR with https://tracker.ceph.com/issues/58242, going to mark this one resolved and will track backports in https://tracker.ceph.com/issues/58242

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #58241

Systemd unit file TimeoutStartSec of 120 seconds is too low

Updated by Redouane Kachach Elhichou over 1 year ago

Updated by Redouane Kachach Elhichou over 1 year ago

Updated by Redouane Kachach Elhichou over 1 year ago

Updated by Redouane Kachach Elhichou over 1 year ago

Updated by Adam King about 1 year ago