Project

General

Profile

Actions

Bug #58241

closed

Systemd unit file TimeoutStartSec of 120 seconds is too low

Added by Voja Molani over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Currently the systemd unit file used for all cephadm-managed containers sets TimeoutStartSec=120 which is too little.
In a case of a busy host restart and when all the containers are being started at the same time the start takes for example 140 seconds (dual E5-2630 v3, 64 GB RAM, slow system disks). It should be increased radically, perhaps to 180-200 seconds.

120 seconds is not enough for a host with all services, including RGW, and 12 OSD when using slow system disks.

What's more, if a container exceeds the 120 seconds start time and systemd terminates it (for trying to be started again later) then cephadm will not detect those containers as running even if a later restart by systemd succeeded in starting the container within the 120 second time limit (separate issue will be created).


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #58242: cephadm doesn't communicate with containers that failed initial start but were successfully restarted later by systemdResolvedRedouane Kachach Elhichou

Actions
Actions #1

Updated by Redouane Kachach Elhichou over 1 year ago

  • Assignee set to Redouane Kachach Elhichou
Actions #2

Updated by Redouane Kachach Elhichou over 1 year ago

  • Status changed from New to In Progress
Actions #3

Updated by Redouane Kachach Elhichou over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 49728
Actions #4

Updated by Redouane Kachach Elhichou over 1 year ago

  • Related to Bug #58242: cephadm doesn't communicate with containers that failed initial start but were successfully restarted later by systemd added
Actions #5

Updated by Adam King about 1 year ago

  • Status changed from Fix Under Review to Resolved

since this shares a PR with https://tracker.ceph.com/issues/58242, going to mark this one resolved and will track backports in https://tracker.ceph.com/issues/58242

Actions

Also available in: Atom PDF