Project

General

Profile

Actions

Bug #64473

closed

cephadm: asyncio timeout handler can't handle conccurent.futures.CancelledError causing the module to crash

Added by Adam King 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
squid, reef, quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Certain failures of commands that we run asynchronously using asyncio to provide a timeout feature can raise a concurrent.futures.CancelledError upon failure. The handler we have for the asyncio timeout stuff doesn't handle this type of exception and the cephadm module ends up crashing. This might only be happening on builds using a python version later than 3.6, so we haven't seen it reported much. Have seen it in python 3.9 builds.

2024-02-15T14:14:00.147593+0000 mgr.cephnode1.rfvatv (mgr.16779) 138077 : cephadm [ERR] executing refresh((['cephnode1', 'cephnode2', 'cephnode3', 'cephnode4'],)) failed.
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/utils.py", line 94, in do_work
    return f(*arg)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 267, in refresh
    r = self._refresh_facts(host)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 370, in _refresh_facts
    val = self.mgr.wait_async(self._run_cephadm_json(
  File "/usr/share/ceph/mgr/cephadm/module.py", line 671, in wait_async
    return self.event_loop.get_result(coro, timeout)
  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 64, in get_result
    return future.result(timeout)
  File "/lib64/python3.9/concurrent/futures/_base.py", line 444, in result
    raise CancelledError()
concurrent.futures._base.CancelledError

Related issues 3 (0 open3 closed)

Copied to Orchestrator - Backport #64628: squid: cephadm: asyncio timeout handler can't handle conccurent.futures.CancelledError causing the module to crashResolvedAdam KingActions
Copied to Orchestrator - Backport #64629: reef: cephadm: asyncio timeout handler can't handle conccurent.futures.CancelledError causing the module to crashResolvedAdam KingActions
Copied to Orchestrator - Backport #64630: quincy: cephadm: asyncio timeout handler can't handle conccurent.futures.CancelledError causing the module to crashResolvedAdam KingActions
Actions #1

Updated by Adam King 2 months ago

  • Backport set to reef, quincy
Actions #2

Updated by Adam King 2 months ago

  • Pull request ID set to 55620
Actions #3

Updated by Adam King about 2 months ago

  • Status changed from In Progress to Pending Backport
  • Backport changed from reef, quincy to squid, reef, quincy
Actions #4

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64628: squid: cephadm: asyncio timeout handler can't handle conccurent.futures.CancelledError causing the module to crash added
Actions #5

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64629: reef: cephadm: asyncio timeout handler can't handle conccurent.futures.CancelledError causing the module to crash added
Actions #6

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64630: quincy: cephadm: asyncio timeout handler can't handle conccurent.futures.CancelledError causing the module to crash added
Actions #7

Updated by Backport Bot about 2 months ago

  • Tags set to backport_processed
Actions #8

Updated by Adam King about 1 month ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF