Project

General

Profile

Actions

Bug #65728

open

Daemon managed by cephadm in an unknown state (CEPHADM_FAILED_DAEMON)

Added by Laura Flores 21 days ago. Updated 6 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664960/remote/smithi045/log/8d9a18e8-ff41-11ee-bc93-c7b262605968/ceph-mon.a.log.gz

2024-04-20T18:19:18.046+0000 7f3e74eae700 20 mon.a@0(leader).mgrstat health checks:
{
    "CEPHADM_FAILED_DAEMON": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "1 failed cephadm daemon(s)",
            "count": 1
        },
        "detail": [
            {
                "message": "daemon alertmanager.smithi104 on smithi104 is in unknown state" 
            }
        ]
    }
}

The cluster warning later cleared up.

2024-04-20T18:19:00.723654+0000 mon.a (mon.0) 774 : cluster [WRN] Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)
2024-04-20T18:19:01.777510+0000 mgr.a (mgr.14427) 39 : cluster [DBG] pgmap v16: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:02.024168+0000 mgr.a (mgr.14427) 40 : cluster [DBG] pgmap v17: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:02.024389+0000 mgr.a (mgr.14427) 41 : cluster [DBG] pgmap v18: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:04.024929+0000 mgr.a (mgr.14427) 42 : cluster [DBG] pgmap v19: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:06.025320+0000 mgr.a (mgr.14427) 43 : cluster [DBG] pgmap v20: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:08.025932+0000 mgr.a (mgr.14427) 44 : cluster [DBG] pgmap v21: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:10.026479+0000 mgr.a (mgr.14427) 45 : cluster [DBG] pgmap v22: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:12.026820+0000 mgr.a (mgr.14427) 46 : cluster [DBG] pgmap v23: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:12.046177+0000 mgr.a (mgr.14427) 47 : cluster [DBG] pgmap v24: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:12.046297+0000 mgr.a (mgr.14427) 48 : cluster [DBG] pgmap v25: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:14.046559+0000 mgr.a (mgr.14427) 49 : cluster [DBG] pgmap v26: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:16.047048+0000 mgr.a (mgr.14427) 50 : cluster [DBG] pgmap v27: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:18.047381+0000 mgr.a (mgr.14427) 51 : cluster [DBG] pgmap v28: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:18.047535+0000 mgr.a (mgr.14427) 52 : cluster [DBG] pgmap v29: 1 pgs: 1 active+clean; 577 KiB data, 85 MiB used, 268 GiB / 268 GiB avail
2024-04-20T18:19:19.045348+0000 mon.a (mon.0) 792 : cluster [INF] Health check cleared: CEPHADM_FAILED_DAEMON (was: 1 failed cephadm daemon(s))
2024-04-20T18:19:19.045384+0000 mon.a (mon.0) 793 : cluster [INF] Cluster is now healthy


Related issues 1 (1 open0 closed)

Related to RADOS - Cleanup #65521: Add expected warnings in cluster log to ignorelistsNew

Actions
Actions #1

Updated by Laura Flores 21 days ago

  • Related to Cleanup #65521: Add expected warnings in cluster log to ignorelists added
Actions #2

Updated by Laura Flores 6 days ago

  • Subject changed from Alertmanager in an unknown state to Daemon managed by cephadm in an unknown state (CEPHADM_FAILED_DAEMON)

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652483

2024-04-12T01:05:08.125+0000 7f5bbef19700 20 mon.a@0(leader).mgrstat health checks:
{
    "CEPHADM_FAILED_DAEMON": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "1 failed cephadm daemon(s)",
            "count": 1
        },
        "detail": [
            {
                "message": "daemon osd.0 on smithi073 is in unknown state" 
            }
        ]
    }
}

Actions #3

Updated by Laura Flores 6 days ago

yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652484

Actions

Also available in: Atom PDF