Project

General

Profile

Actions

Feature #20087

closed

OSD: Add heartbeat message for Jumbo Frames(MTU 9000)

Added by Vikhyat Umrao almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
jewel
Reviewed:
Affected Versions:
Pull request ID:

Description

- OSD: Add heartbeat message for Jumbo Frames(MTU 9000)

- When we have jumbo frames enabled in cluster network and if MTU is not configured properly like the recommendation is all interconnecting network gear must also have jumbo frames enabled but if any device is misconfigured for jumbo frames then we see a lot of issues like peering stuck, slow requests and backfilling not progressing.

- And the issue is we do not see heartbeat timeout messages in the OSD logs because heartbeat messages packet size is below 1500.

- We checked the communication issue with below command:

# ping -W 2 -I <interface> -M do -s <pkt size> <IP address>

Downstream feature request: https://bugzilla.redhat.com/show_bug.cgi?id=1455711


Related issues 2 (1 open1 closed)

Is duplicate of Ceph - Feature #18438: Configurable OSD Heartbeat packet size (MTU)New01/06/2017

Actions
Copied to Ceph - Backport #20353: jewel: OSD: Add heartbeat message for Jumbo Frames(MTU 9000)ResolvedVikhyat UmraoActions
Actions #1

Updated by Vikhyat Umrao almost 7 years ago

  • Description updated (diff)
Actions #2

Updated by Vikhyat Umrao almost 7 years ago

We have another feature request: http://tracker.ceph.com/issues/18438 for Configurable OSD Heartbeat packet size (MTU) for same issue.

Actions #3

Updated by Vikhyat Umrao almost 7 years ago

  • Subject changed from OSD: Add heartbeat message for Jumbo Frames(MTU 900) to OSD: Add heartbeat message for Jumbo Frames(MTU 9000)
  • Description updated (diff)
Actions #4

Updated by Greg Farnum almost 7 years ago

  • Is duplicate of Feature #18438: Configurable OSD Heartbeat packet size (MTU) added
Actions #5

Updated by Greg Farnum almost 7 years ago

I've seen stuff about this before but not been entirely clear on what's happening. Is the issue that the local box is configured for jumbo frames but the switch silently drops them? I'm wondering if there's something Ceph can query to know if it needs to do this validation.

I suppose we can inflate the heartbeat packets with a zero-filled bufferlist or something. Should we do that for every heartbeat? I suppose a 9KB packet that gets thrown away isn't that much wasted network bandwidth...

Actions #6

Updated by Vikhyat Umrao almost 7 years ago

  • Description updated (diff)

Greg Farnum wrote:

Thanks Greg for your inputs.

I've seen stuff about this before but not been entirely clear on what's happening. Is the issue that the local box is configured for jumbo frames but the switch silently drops them? I'm wondering if there's something Ceph can query to know if it needs to do this validation.

Yes. This was the case. Local was having MTU configured as 9000 and there was some issue at switch layer configuration for 9000 MTU and osd does not log about heartbeat failures.

I suppose we can inflate the heartbeat packets with a zero-filled bufferlist or something. Should we do that for every heartbeat? I suppose a 9KB packet that gets thrown away isn't that much wasted network bandwidth...

Yep. Yesterday I had a quick discussion with Josh before creating this feature request and we agreed that a feature for periodically sending a larger request to detect that MTU issue would be great. Maybe we can choose periodically? `osd_heartbeat_interval` default is 6 seconds maybe we can choose even number packets?

Actions #8

Updated by Vikhyat Umrao almost 7 years ago

Actions #10

Updated by Vikhyat Umrao almost 7 years ago

  • Backport set to jewel
Actions #11

Updated by Greg Farnum almost 7 years ago

  • Status changed from 7 to Pending Backport

Note to backporters: consider whatever happens with https://github.com/ceph/ceph/pull/15727 !

Actions #12

Updated by Nathan Cutler almost 7 years ago

  • Copied to Backport #20353: jewel: OSD: Add heartbeat message for Jumbo Frames(MTU 9000) added
Actions #13

Updated by Vikhyat Umrao almost 7 years ago

Greg Farnum wrote:

Note to backporters: consider whatever happens with https://github.com/ceph/ceph/pull/15727 !

Thanks Greg. I have assigned the backport to myself. I will keep tracking of 15727 and will take action according to that.

Actions #14

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF