Project

General

Profile

Actions

Feature #40640

closed

Network ping monitoring

Added by David Zafman almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, mimic, nautilus
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

The simplest version of this would be to see warnings if heartbeat ping response time exceeds certain thresholds.


Subtasks 1 (1 open0 closed)

Feature #41563: Add connection reset tracking to Network ping monitoringNewDavid Zafman07/02/2019

Actions

Related issues 6 (0 open6 closed)

Related to RADOS - Bug #41689: Network ping test fails in TEST_network_ping_test2ResolvedDavid Zafman09/06/2019

Actions
Related to RADOS - Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW_PING_TIME_FRONT)ResolvedDavid Zafman09/10/2019

Actions
Related to RADOS - Bug #42570: mgr: qa: upgrade mimic-master "src/osd/osd_types.h: 2313: FAILED ceph_assert(pos <= end)"ResolvedDavid Zafman

Actions
Copied to RADOS - Backport #41695: nautilus: Network ping monitoringResolvedDavid ZafmanActions
Copied to RADOS - Backport #41696: mimic: Network ping monitoringResolvedDavid ZafmanActions
Copied to RADOS - Backport #41697: luminous: Network ping monitoringResolvedDavid ZafmanActions
Actions #1

Updated by David Zafman almost 5 years ago

  • Description updated (diff)
Actions #2

Updated by David Zafman almost 5 years ago

See also https://pad.ceph.com/p/Network_ping_monitoring

Examples, with warning threshold set to 1 microsecond.

Summary status example
SLOW_PING_TIME_BACK Long heartbeat ping times on back interface seen, longest is 1488 msec, SLOW_PING_TIME_FRONT Long heartbeat ping times on front interface seen, longest is 1805 msec

Detail status example
SLOW_PING_TIME_BACK Long heartbeat ping times on back interface seen, longest is 1488 msec
Slow heartbeat ping on back interface from osd.1 to osd.2 1488 msec
Slow heartbeat ping on back interface from osd.0 to osd.2 1412 msec
Slow heartbeat ping on back interface from osd.2 to osd.1 1364 msec
Slow heartbeat ping on back interface from osd.2 to osd.0 1346 msec
Slow heartbeat ping on back interface from osd.0 to osd.1 1310 msec
Truncated long network list. Use ceph daemon osd.# dump_network for more information

SLOW_PING_TIME_FRONT Long heartbeat ping times on front interface seen, longest is 1805 msec
Slow heartbeat ping on front interface from osd.2 to osd.1 1805 msec
Slow heartbeat ping on front interface from osd.0 to osd.2 1648 msec
Slow heartbeat ping on front interface from osd.1 to osd.2 1495 msec
Slow heartbeat ping on front interface from osd.2 to osd.0 1461 msec
Slow heartbeat ping on front interface from osd.1 to osd.0 1448 msec
Truncated long network list. Use ceph daemon osd.# dump_network for more information

Actions #3

Updated by Neha Ojha over 4 years ago

  • Status changed from New to Fix Under Review
Actions #4

Updated by David Zafman over 4 years ago

  • Copied to Feature #41563: Add connection reset tracking to Network ping monitoring added
Actions #5

Updated by David Zafman over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to luminous, mimic, nautilus
Actions #6

Updated by David Zafman over 4 years ago

  • Related to Bug #41689: Network ping test fails in TEST_network_ping_test2 added
Actions #7

Updated by Nathan Cutler over 4 years ago

Actions #8

Updated by Nathan Cutler over 4 years ago

Actions #9

Updated by Nathan Cutler over 4 years ago

Actions #10

Updated by David Zafman over 4 years ago

  • Related to Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW_PING_TIME_FRONT) added
Actions #11

Updated by David Zafman over 4 years ago

  • Related to Bug #42570: mgr: qa: upgrade mimic-master "src/osd/osd_types.h: 2313: FAILED ceph_assert(pos <= end)" added
Actions #12

Updated by David Zafman over 4 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF