Bug #1100
osd: marking peers down
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:
Description
I'm reliably seeing peers mark each other down when they shouldn't on benjamin. There are ~21 osds across 3 nodes, and simply restarting them all starts a storm. Something is broken in the heartbeat exchanges.
The workaround is to temporarily increase osd heartbeat grace until everything is up and then lower it again.
History
#1 Updated by Sage Weil over 9 years ago
- Status changed from New to Resolved