Project

General

Profile

Actions

Bug #1841

closed

OSDs should disconnect from Monitor before their MOSDPGStat timeouts happen

Added by Greg Farnum over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Right now OSDs don't notice their monitor connection has dropped until after the (by default) 15 minute TCP connection timeout that the SimpleMessenger provides. This is doubly unfortunate because 15 minutes is also the timeout for them to not send MOSDPGStat messages to the monitor cluster.

Either the OSDs need to set a different TCP timeout; they need to independently notice they're not talking to the monitor, or we need to extend the MOSDPGStat timeouts. I'm leaning toward option 2 but haven't thought enough about the implications of each option.

Actions #1

Updated by Sage Weil over 12 years ago

My memory is a bit fuzzy, but I think they're waiting on acks for the MOSDPGStat messages they're sending.. checking for a timeout on that is probably the simplest way to go. (so yeah, #2 gets my vote too!)

Actions #2

Updated by Greg Farnum over 12 years ago

  • Status changed from New to In Progress
  • Assignee set to Greg Farnum

Yep; it is easy enough to add a check in tick based on how long it's been since we sent a PGStat without getting an ack. Looks like that will be safe, too!

Actions #3

Updated by Greg Farnum over 12 years ago

Pushed a wip-osd-mon-communication branch that implements this. It's untested, though!

Actions #4

Updated by Greg Farnum over 12 years ago

  • Status changed from In Progress to Resolved

Sage merged this into master.

Actions

Also available in: Atom PDF