Actions
Bug #54024
closedmgr/cephadm: timeouts for ssh/binary commands
% Done:
0%
Source:
Tags:
backport_processed
Backport:
reef, quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Description
Some thoughts from orch weekly
Timeouts (ssh commands in mgr module, commands in binary) * how do we gracefully recover when an operation is blocked on a host * https://tracker.ceph.com/issues/53846 * ssh has a 15 min timeout: https://tracker.ceph.com/issues/51733 * asyncssh: connection.run has a timeout: * https://github.com/ronf/asyncssh/blob/215dbf63fd82270716814de63e045c512d0e5b72/asyncssh/connection.py#L4014 * how long should we wait? * ceph-volume ls on dense nodes * done from the cephadm agent * downlaoding container images though slow internet connections * can we avoid that? https://tracker.ceph.com/issues/53276 * reproduce: artificially create a stale global cephadm lock * make ssh run command timeout configurable in case a cluster actually runs into those timeouts? * make timeout 15 mins? or 5 mins?
Decision there was ultimately to try to pass the --timeout flag the cpehadm binary offers to see if it would cause the commands to eventually return, then raise a health warning if we see the timeout happen
Updated by Adam King about 1 year ago
- Status changed from New to In Progress
- Pull request ID set to 50722
Updated by Adam King about 1 year ago
- Status changed from In Progress to Pending Backport
- Backport set to reef, quincy
Updated by Backport Bot about 1 year ago
- Copied to Backport #59549: quincy: mgr/cephadm: timeouts for ssh/binary commands added
Updated by Backport Bot about 1 year ago
- Copied to Backport #59550: reef: mgr/cephadm: timeouts for ssh/binary commands added
Actions