Project

General

Profile

Feature #40640

Network ping monitoring

Added by David Zafman 8 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, mimic, nautilus
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

The simplest version of this would be to see warnings if heartbeat ping response time exceeds certain thresholds.


Subtasks

Feature #41563: Add connection reset tracking to Network ping monitoringNewDavid Zafman


Related issues

Related to RADOS - Bug #41689: Network ping test fails in TEST_network_ping_test2 Resolved 09/06/2019
Related to RADOS - Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW_PING_TIME_FRONT) Resolved 09/10/2019
Related to RADOS - Bug #42570: mgr: qa: upgrade mimic-master "src/osd/osd_types.h: 2313: FAILED ceph_assert(pos <= end)" Resolved
Copied to RADOS - Backport #41695: nautilus: Network ping monitoring Resolved
Copied to RADOS - Backport #41696: mimic: Network ping monitoring Resolved
Copied to RADOS - Backport #41697: luminous: Network ping monitoring Resolved

History

#1 Updated by David Zafman 8 months ago

  • Description updated (diff)

#2 Updated by David Zafman 8 months ago

See also https://pad.ceph.com/p/Network_ping_monitoring

Examples, with warning threshold set to 1 microsecond.

Summary status example
SLOW_PING_TIME_BACK Long heartbeat ping times on back interface seen, longest is 1488 msec, SLOW_PING_TIME_FRONT Long heartbeat ping times on front interface seen, longest is 1805 msec

Detail status example
SLOW_PING_TIME_BACK Long heartbeat ping times on back interface seen, longest is 1488 msec
Slow heartbeat ping on back interface from osd.1 to osd.2 1488 msec
Slow heartbeat ping on back interface from osd.0 to osd.2 1412 msec
Slow heartbeat ping on back interface from osd.2 to osd.1 1364 msec
Slow heartbeat ping on back interface from osd.2 to osd.0 1346 msec
Slow heartbeat ping on back interface from osd.0 to osd.1 1310 msec
Truncated long network list. Use ceph daemon osd.# dump_network for more information

SLOW_PING_TIME_FRONT Long heartbeat ping times on front interface seen, longest is 1805 msec
Slow heartbeat ping on front interface from osd.2 to osd.1 1805 msec
Slow heartbeat ping on front interface from osd.0 to osd.2 1648 msec
Slow heartbeat ping on front interface from osd.1 to osd.2 1495 msec
Slow heartbeat ping on front interface from osd.2 to osd.0 1461 msec
Slow heartbeat ping on front interface from osd.1 to osd.0 1448 msec
Truncated long network list. Use ceph daemon osd.# dump_network for more information

#3 Updated by Neha Ojha 7 months ago

  • Status changed from New to Fix Under Review

#4 Updated by David Zafman 6 months ago

  • Copied to Feature #41563: Add connection reset tracking to Network ping monitoring added

#5 Updated by David Zafman 6 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to luminous, mimic, nautilus

#6 Updated by David Zafman 6 months ago

  • Related to Bug #41689: Network ping test fails in TEST_network_ping_test2 added

#7 Updated by Nathan Cutler 6 months ago

#8 Updated by Nathan Cutler 6 months ago

#9 Updated by Nathan Cutler 6 months ago

#10 Updated by David Zafman 5 months ago

  • Related to Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW_PING_TIME_FRONT) added

#11 Updated by David Zafman 3 months ago

  • Related to Bug #42570: mgr: qa: upgrade mimic-master "src/osd/osd_types.h: 2313: FAILED ceph_assert(pos <= end)" added

#12 Updated by David Zafman 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF