Project

General

Profile

Actions

Feature #18438

open

Configurable OSD Heartbeat packet size (MTU)

Added by Ross Martyn over 7 years ago. Updated over 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Hi,

Firstly, apologies as I am fairly high level in my understanding here.

During maintenance of our Ceph cluster, we moved one of our OSD nodes to a new rack. When this node came back online, the whole cluster came to a halt.

This turned out to be due to Jumbo frames not being enabled along part of the path to the previous rack (and other infrastructure), essentially a networking issue. (We only run Jumbo frames on our replication network for the moment due to client side issues with the public network.)

However, I would have thought that if the OSD's were unable to properly communicate they would take themselves out of the crushmap/or stop the service.

This appears to be due to the heartbeat packets being under 1500 bytes, thus not being segmented or dropped as a larger frame would, and causing the OSD's to stay online. This means that the the rest of the cluster, still attempting to sync, grinds to a halt trying to talk to the node with large frames (which are dropped).

Its fair to say that this isn't really a bug, but more of a feature request, to further increase Ceph's resilience to network issues, by enabling configuration to send larger heartbeat packets.

To reiterate, one single misconfigured switch port, could cause a very big ceph outage!

We used 'ping ip -M do -s 1450 (and then 1650) to diagnose the MTU fault.

Appreciate any feedback!

Thanks.

Ross


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Feature #20087: OSD: Add heartbeat message for Jumbo Frames(MTU 9000)ResolvedGreg Farnum05/25/2017

Actions
Actions #1

Updated by Ross Martyn over 7 years ago

Tried to remove the 'Target Version'... Not able to!

Actions #2

Updated by Nathan Cutler over 7 years ago

  • Target version deleted (v10.2.6)
Actions #3

Updated by Greg Farnum almost 7 years ago

  • Has duplicate Feature #20087: OSD: Add heartbeat message for Jumbo Frames(MTU 9000) added
Actions

Also available in: Atom PDF