Bug #1841
OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen
0%
Description
Right now OSDs don't notice their monitor connection has dropped until after the (by default) 15 minute TCP connection timeout that the SimpleMessenger provides. This is doubly unfortunate because 15 minutes is also the timeout for them to not send MOSDPGStat messages to the monitor cluster.
Either the OSDs need to set a different TCP timeout; they need to independently notice they're not talking to the monitor, or we need to extend the MOSDPGStat timeouts. I'm leaning toward option 2 but haven't thought enough about the implications of each option.
History
#1 Updated by Sage Weil almost 12 years ago
My memory is a bit fuzzy, but I think they're waiting on acks for the MOSDPGStat messages they're sending.. checking for a timeout on that is probably the simplest way to go. (so yeah, #2 gets my vote too!)
#2 Updated by Greg Farnum almost 12 years ago
- Status changed from New to In Progress
- Assignee set to Greg Farnum
Yep; it is easy enough to add a check in tick based on how long it's been since we sent a PGStat without getting an ack. Looks like that will be safe, too!
#3 Updated by Greg Farnum almost 12 years ago
Pushed a wip-osd-mon-communication branch that implements this. It's untested, though!
#4 Updated by Greg Farnum almost 12 years ago
- Status changed from In Progress to Resolved
Sage merged this into master.