Project

General

Profile

Actions

Bug #58241

closed

Systemd unit file TimeoutStartSec of 120 seconds is too low

Added by Voja Molani over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Currently the systemd unit file used for all cephadm-managed containers sets TimeoutStartSec=120 which is too little.
In a case of a busy host restart and when all the containers are being started at the same time the start takes for example 140 seconds (dual E5-2630 v3, 64 GB RAM, slow system disks). It should be increased radically, perhaps to 180-200 seconds.

120 seconds is not enough for a host with all services, including RGW, and 12 OSD when using slow system disks.

What's more, if a container exceeds the 120 seconds start time and systemd terminates it (for trying to be started again later) then cephadm will not detect those containers as running even if a later restart by systemd succeeded in starting the container within the 120 second time limit (separate issue will be created).


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #58242: cephadm doesn't communicate with containers that failed initial start but were successfully restarted later by systemdResolvedRedouane Kachach Elhichou

Actions
Actions

Also available in: Atom PDF