Project

General

Profile

Actions

Bug #45627

closed

cephadm: frequently getting `1 hosts fail cephadm check`

Added by Sebastian Wagner almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Matthew Oliver
Category:
cephadm
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ADK3Y2XHTIJ2YV6MFSQX4XPTQ4WP5ETM/

I can access all rdb devices and CephFS. They work. All OSDs in server-1
is up.

    health: HEALTH_WARN
            1 hosts fail cephadm check
            failed to probe daemons or devices

I even restarted server-1. No luck.

I'm on server-1. cephadm complains it cannot access to server-1. In basic
term, server-1 cannot access server-1 (192.168.0.1)

server-1: 192.168.0.1
server-2: 192.168.0.3

$ ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get
mgr/cephadm/ssh_identity_key) root@server-1
> Success.

I think we have to rethink ssh connections. Looks like execnet can't handle being loaded within a long-running daemon.

This happens (unfortunately) frequently to me. Look for the active mgr
(ceph -s), and go restart the mgr service there (systemctl list-units |grep
mgr then systemctl restart NAMEOFSERVICE). This normally resolves that
error for me.

Related issues 3 (0 open3 closed)

Related to Orchestrator - Bug #45032: cephadm: Not recovering from `OSError: cannot send (already closed?)`ResolvedMatthew Oliver

Actions
Related to Orchestrator - Bug #45621: check-host returns terrible unhelpful error messageDuplicate

Actions
Related to Orchestrator - Bug #45737: Module 'cephadm' has failed: cannot send (already closed?)Duplicate

Actions
Actions

Also available in: Atom PDF