Actions
Bug #51736
closedmgr hung forever when execute multiprocessing.pool.ThreadPool accidentally
Status:
Resolved
Priority:
Low
Assignee:
-
Category:
cephadm
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Description
Envrionment:
We have 30+ hosts cluster ceph. 3 mons, 3 mgrs, 330 osds.
Description:
After running one day approximately, mgr stuck and the log of mgr will output no longer.
We analyse the mgr log, and find that it stuck at the function of refresh(host) which is decorated by forall_hosts
'forall_hosts' is implemented by 'multiprocessing.pool.ThreadPool'.
Test:
We modify multiprocessing.pool.ThreadPool(10) to multiprocessing.pool.ThreadPool(1). It won't get stuck anymore.
Guess:
It may be deadlock in multiprocessing
Actions