Project

General

Profile

Actions

Bug #53448

closed

cephadm: agent failures double reported by two health checks

Added by Adam King over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Whe nagents are down they are reported in both the agent down and failed daemon health check.
It's only really necessary to have them in one and it can be confusing since the criteria for agent down is different than failed daemon (not reporting in time vs. systemd status) yet being put in the former automatically puts them in the latter.

Example, almost all the "failed cephadm daemon(s)" reported here are just repeat reports of the agents marked

cluster:
    id:     f148c330-47c9-11ec-9f19-1dfe2cdc6a6d
    health: HEALTH_ERR
            126 Cephadm Agent(s) are not reporting. Hosts may be offline
            Kernel Security Module (SELinux/AppArmor) is inconsistent for 19 hosts
            131 failed cephadm daemon(s)
            failed to probe daemons or devices


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #53723: Cephadm agent fails to report and causes a health timeoutResolvedAdam King

Actions
Actions #1

Updated by Adam King over 2 years ago

  • Pull request ID set to 44158
Actions #2

Updated by Laura Flores over 2 years ago

@Adam DC949 King would you say that https://tracker.ceph.com/issues/53723 is related to this Tracker?

Actions #3

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #53723: Cephadm agent fails to report and causes a health timeout added
Actions #4

Updated by Sebastian Wagner over 2 years ago

  • Status changed from In Progress to Resolved
Actions #5

Updated by Laura Flores over 2 years ago

  • Related to deleted (Bug #53723: Cephadm agent fails to report and causes a health timeout)
Actions #6

Updated by Laura Flores over 2 years ago

  • Related to Bug #53723: Cephadm agent fails to report and causes a health timeout added
Actions #7

Updated by Laura Flores over 2 years ago

Accidentally deleted the related issue; ignore.

Actions

Also available in: Atom PDF