Project

General

Profile

Actions

Bug #51688

open

"stuck peering for" warning is misleading

Added by Dan van der Ster almost 3 years ago. Updated 7 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When OSDs restart or crush maps change it is common to see a HEALTH_WARN claiming that PGs have been stuck peering since awhile, even though they were active just seconds ago.
It would be preferable if we only report PG_AVAILABILITY issues if they really are stuck peering longer than 60s.

E.g.

HEALTH_WARN Reduced data availability: 50 pgs peering
PG_AVAILABILITY Reduced data availability: 50 pgs peering
    pg 3.7df is stuck peering for 792.178587, current state remapped+peering, last acting [100,113,352]
    pg 3.8ae is stuck peering for 280.567053, current state remapped+peering, last acting [226,345,350]
    pg 3.c0b is stuck peering for 1018.081127, current state remapped+peering, last acting [62,246,249]
    pg 3.fc9 is stuck peering for 65.799756, current state remapped+peering, last acting [123,447,351]
    pg 4.c is stuck peering for 208.471034, current state remapped+peering, last acting [123,501,247]
...

(Related: I proposed to change PG_AVAILABILITY issues to HEALTH_ERR at https://tracker.ceph.com/issues/23565 and https://github.com/ceph/ceph/pull/42192, so this needs to be fixed before merging that.)

I tracked this to `PGMap::get_health_checks` which will mark a PG as stuck peering if now - last_peered > mon_pg_stuck_threshold.
But the problem is that last_peered is only updated if there is IO on a PG -- an OSD doesn't send pgstats if it is idle.
To fix, we could update last_active/last_peered etc, and send a pg stats update more frequently even when idle?

Clearly osd_pg_stat_report_interval_max is related here, but the default is 500 and we have some PGs reported stuck peering longer than 500s, so there is still something missing here.

We observe this in nautilus, but the code hasn't changed much in master AFAICT.


Related issues 2 (2 open0 closed)

Copied to RADOS - Backport #62926: quincy: "stuck peering for" warning is misleadingNewShreyansh SanchetiActions
Copied to RADOS - Backport #62927: reef: "stuck peering for" warning is misleadingNewShreyansh SanchetiActions
Actions #1

Updated by Dan van der Ster almost 3 years ago

  • Subject changed from stuck peering since warning is misleading to "stuck peering for" warning is misleading
Actions #2

Updated by Laura Flores over 1 year ago

  • Tags set to low-hanging-fruit
Actions #3

Updated by Laura Flores over 1 year ago

  • Translation missing: en.field_tag_list set to low-hanging-fruit
  • Tags deleted (low-hanging-fruit)
Actions #4

Updated by Laura Flores over 1 year ago

The relevant code would be in `src/mon/PGMap.cc` and `src/mon/PGMap.h`.

Actions #5

Updated by Laura Flores over 1 year ago

Peering PGs can be simulated in a vstart cluster by marking an OSD down with `./bin/ceph osd down <id>`.

Actions #6

Updated by Laura Flores over 1 year ago

Shreyansh Sancheti is working on this bug.

Actions #7

Updated by Laura Flores over 1 year ago

  • Status changed from New to In Progress
Actions #8

Updated by Vikhyat Umrao over 1 year ago

  • Assignee set to Shreyansh Sancheti
Actions #9

Updated by Laura Flores about 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 49332
Actions #10

Updated by Radoslaw Zarzynski about 1 year ago

Bump up! Need to decide on the needs-test.

Actions #11

Updated by Prashant D 11 months ago

@Shreyansh This script does reproduce this issue but peering is intermediate. We can work on it to reproduce it consistently :

#!/bin/bash

../src/stop.sh
MON=1 MGR=1 OSD=3 ../src/vstart.sh -n -d
./bin/ceph osd pool create rbd 8 8
./bin/ceph osd pool set rbd min_size 1
./bin/ceph df
./bin/rados bench 20 write -p rbd --no-cleanup
./bin/ceph osd tree
sudo ./bin/init-ceph stop osd.1
sudo ./bin/init-ceph stop osd.2
./bin/rados bench 20 write -p rbd --no-cleanup
./bin/rados bench 20 write -p rbd --no-cleanup
sudo ./bin/init-ceph stop osd.0
sudo ./bin/init-ceph start osd.1
sudo ./bin/init-ceph start osd.2
./bin/ceph osd down 0
./bin/ceph osd lost osd.0 --yes-i-really-mean-it
./bin/rados bench 20 write -p rbd --no-cleanup
./bin/rados bench 20 write -p rbd --no-cleanup
sudo ./bin/init-ceph start osd.0
Actions #12

Updated by Radoslaw Zarzynski 11 months ago

Hi Prashant! How about turning this reproducer into a workunit and appending to the PR?

Actions #13

Updated by Radoslaw Zarzynski 7 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to reef,quincy
Actions #14

Updated by Backport Bot 7 months ago

  • Copied to Backport #62926: quincy: "stuck peering for" warning is misleading added
Actions #15

Updated by Backport Bot 7 months ago

  • Copied to Backport #62927: reef: "stuck peering for" warning is misleading added
Actions #16

Updated by Backport Bot 7 months ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF