Actions
Bug #1100
closedosd: marking peers down
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I'm reliably seeing peers mark each other down when they shouldn't on benjamin. There are ~21 osds across 3 nodes, and simply restarting them all starts a storm. Something is broken in the heartbeat exchanges.
The workaround is to temporarily increase osd heartbeat grace until everything is up and then lower it again.
Actions