Project

General

Profile

Actions

Bug #51736

closed

mgr hung forever when execute multiprocessing.pool.ThreadPool accidentally

Added by hongloumeng a almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Envrionment:
We have 30+ hosts cluster ceph. 3 mons, 3 mgrs, 330 osds.

Description:
After running one day approximately, mgr stuck and the log of mgr will output no longer.
We analyse the mgr log, and find that it stuck at the function of refresh(host) which is decorated by forall_hosts
'forall_hosts' is implemented by 'multiprocessing.pool.ThreadPool'.

Test:
We modify multiprocessing.pool.ThreadPool(10) to multiprocessing.pool.ThreadPool(1). It won't get stuck anymore.

Guess:
It may be deadlock in multiprocessing


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #51733: offline host hangs serve loop for 15 minsResolvedAdam King

Actions
Actions

Also available in: Atom PDF