Bug #42452
msg/async: the event center is blocked by rdma construct conection for transport ib sync msg
% Done:
0%
Source:
Development
Tags:
Backport:
nautilus, mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Component(RADOS):
Messenger
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
In msg/async/rdma, We construct a tcp connection to transport ib sync msg, if the
remote node is shutdown (shutdown by accident), the net.connect will be blocked until timeout
is reached, which cause the event center be blocked.
This bug may cause mon probe timeout and osd not reply, and so on.
Related issues
History
#1 Updated by Peng Liu almost 4 years ago
How to trigger this Bug:
1. use async+rdma;
2. reboot a server;
3. observe cluster recovery time;
4. observe whether have normal osds are mark down.
#2 Updated by Kefu Chai almost 4 years ago
- Status changed from New to Fix Under Review
- Assignee set to Peng Liu
- Pull request ID set to 31109
#3 Updated by Nathan Cutler almost 4 years ago
- Backport set to nautilus, mimic
#4 Updated by Kefu Chai almost 4 years ago
- Project changed from mgr to RADOS
- Component(RADOS) Messenger added
#5 Updated by Kefu Chai over 3 years ago
- Status changed from Fix Under Review to Pending Backport
#6 Updated by Nathan Cutler over 3 years ago
- Copied to Backport #44369: mimic: msg/async: the event center is blocked by rdma construct conection for transport ib sync msg added
#7 Updated by Nathan Cutler over 3 years ago
- Copied to Backport #44370: nautilus: msg/async: the event center is blocked by rdma construct conection for transport ib sync msg added
#8 Updated by Nathan Cutler over 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".