Project

General

Profile

Actions

Bug #54024

closed

mgr/cephadm: timeouts for ssh/binary commands

Added by Adam King about 2 years ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef, quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Some thoughts from orch weekly

Timeouts (ssh commands in mgr module, commands in binary)
* how do we gracefully recover when an operation is blocked on a host
  * https://tracker.ceph.com/issues/53846
* ssh has a 15 min timeout: https://tracker.ceph.com/issues/51733
* asyncssh: connection.run has a timeout:
  * https://github.com/ronf/asyncssh/blob/215dbf63fd82270716814de63e045c512d0e5b72/asyncssh/connection.py#L4014 
* how long should we wait?
  * ceph-volume ls on dense nodes
    * done from the cephadm agent
  * downlaoding container images though slow internet connections
    * can we avoid that? https://tracker.ceph.com/issues/53276
* reproduce: artificially create a stale global cephadm lock
* make ssh run command timeout configurable in case a cluster actually runs into those timeouts?
  * make timeout 15 mins? or 5 mins?

Decision there was ultimately to try to pass the --timeout flag the cpehadm binary offers to see if it would cause the commands to eventually return, then raise a health warning if we see the timeout happen


Related issues 2 (0 open2 closed)

Copied to Orchestrator - Backport #59549: quincy: mgr/cephadm: timeouts for ssh/binary commandsResolvedAdam KingActions
Copied to Orchestrator - Backport #59550: reef: mgr/cephadm: timeouts for ssh/binary commandsResolvedAdam KingActions
Actions #1

Updated by Adam King about 2 years ago

  • Assignee set to Adam King
Actions #2

Updated by Adam King about 1 year ago

  • Status changed from New to In Progress
  • Pull request ID set to 50722
Actions #3

Updated by Adam King about 1 year ago

  • Status changed from In Progress to Pending Backport
  • Backport set to reef, quincy
Actions #4

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59549: quincy: mgr/cephadm: timeouts for ssh/binary commands added
Actions #5

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59550: reef: mgr/cephadm: timeouts for ssh/binary commands added
Actions #6

Updated by Backport Bot about 1 year ago

  • Tags set to backport_processed
Actions #7

Updated by Adam King 11 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF