Project

General

Profile

Actions

Bug #50100

open

stale slow osd heartbeats health alert

Added by Sage Weil about 3 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

[WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 329881.007ms)
    Slow OSD heartbeats on back from osd.57 [] to osd.34 [] 329881.007 msec
    Slow OSD heartbeats on back from osd.46 [] to osd.22 [] 322439.429 msec
    Slow OSD heartbeats on back from osd.6 [] to osd.39 [] 322141.381 msec
    Slow OSD heartbeats on back from osd.6 [] to osd.1 [] 317347.729 msec
    Slow OSD heartbeats on back from osd.19 [] to osd.36 [] 312785.584 msec
    Slow OSD heartbeats on back from osd.43 [] to osd.61 [] 92993.926 msec
    Slow OSD heartbeats on back from osd.43 [] to osd.63 [] 92839.392 msec
    Slow OSD heartbeats on back from osd.43 [] to osd.53 [] 92786.246 msec possibly improving
    Slow OSD heartbeats on back from osd.43 [] to osd.52 [] 92786.206 msec
    Slow OSD heartbeats on back from osd.57 [] to osd.51 [] 92587.894 msec
    Truncated long network list.  Use ceph daemon mgr.# dump_osd_network for more information
[WRN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 330695.632ms)
    Slow OSD heartbeats on front from osd.57 [] to osd.34 [] 330695.632 msec
    Slow OSD heartbeats on front from osd.46 [] to osd.22 [] 322945.797 msec
    Slow OSD heartbeats on front from osd.6 [] to osd.39 [] 320848.377 msec
    Slow OSD heartbeats on front from osd.6 [] to osd.1 [] 317744.886 msec
    Slow OSD heartbeats on front from osd.19 [] to osd.36 [] 313810.277 msec
    Slow OSD heartbeats on front from osd.43 [] to osd.52 [] 92994.370 msec
    Slow OSD heartbeats on front from osd.57 [] to osd.51 [] 92884.778 msec
    Slow OSD heartbeats on front from osd.43 [] to osd.53 [] 92839.072 msec
    Slow OSD heartbeats on front from osd.43 [] to osd.63 [] 92786.355 msec
    Slow OSD heartbeats on front from osd.43 [] to osd.61 [] 92786.328 msec
    Truncated long network list.  Use ceph daemon mgr.# dump_osd_network for more information

the dump has
{
    "threshold": 1000,
    "entries": [
        {
            "last update": "Thu Apr  1 15:36:08 2021",
            "stale": true,
            "from osd": 57,
            "to osd": 34,
            "interface": "front",
            "average": {
                "1min": 330695.632,
                "5min": 330695.632,
                "15min": 44364.478
            },
            "min": {
                "1min": 330695.632,
                "5min": 330695.632,
                "15min": 330695.632
            },
            "max": {
                "1min": 330695.632,
                "5min": 330695.632,
                "15min": 330695.632
            },
            "last": 1.964
        },
...

but when i look at 'ceph pg dump osds' i don't see that osd.57 ping osd.34:
OSD_STAT  USED      AVAIL    USED_RAW  TOTAL    HB_PEERS                                                                                                                                     PG_SUM  PRIMARY_PG_SUM
...
57         547 GiB  1.3 TiB   547 GiB  1.8 TiB                                              [0,2,4,8,9,10,12,13,14,18,21,22,24,26,28,29,30,33,35,36,37,39,42,43,44,45,46,51,52,53,55,59,61]      38               7

same goes for the others

This is 16.1.0-1341-gc0a8a600 / 16.2.0.

Actions #1

Updated by Neha Ojha over 2 years ago

  • Priority changed from High to Normal
Actions

Also available in: Atom PDF