Project

General

Profile

Actions

Bug #45737

closed

Module 'cephadm' has failed: cannot send (already closed?)

Added by Alain Deleglise almost 4 years ago. Updated almost 4 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

I have a development cluster running on 4 VM. They're all running CentOS8 Stream and were bootstraped using cephadm (octopus).

We were running tests, and I stopped VMs randomly to see how the cluster reacts.

When I siwtching VMs back on, the cluster state was Warning, saying that 2 nodes didn't passed the checks. So I went and run `ceph cephadm check-host HOST`, host1 was fine, but host2 gave a python stack trace, unfortunatly I've not copied it.

Now ceph cephadm says : Error EIO: Module 'cephadm' has experienced an error and cannot handle commands: cannot send (already closed?)

ceph -v
ceph version 15.2.2 (0c857e985a29d90501a285f242ea9c008df49eb8) octopus (stable)

cat /etc/centos-release
CentOS Linux release 8.1.1911 (Core)


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #45627: cephadm: frequently getting `1 hosts fail cephadm check`ResolvedMatthew Oliver

Actions
Actions #1

Updated by Sebastian Wagner almost 4 years ago

  • Project changed from Ceph to Orchestrator
  • Category deleted (ceph cli)
Actions #2

Updated by Sebastian Wagner almost 4 years ago

  • Related to Bug #45627: cephadm: frequently getting `1 hosts fail cephadm check` added
Actions #3

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from New to Duplicate
Actions #4

Updated by Sebastian Wagner almost 4 years ago

  • Target version deleted (v15.2.2)
Actions #5

Updated by Alain Deleglise almost 4 years ago

Hi,

So besides the fact this is a duplicate of an issue, that is waiting to have a fix reviewed, what should I do in the mean time ?

I mean, is it fixable in some way ? Should I wait for the fixe to be released and update components ? Right now my cluster is a developement cluster, so it's not crucial, but one can easily imagine this happening in production, what is the stance to have in such situation ?

Thanks

Actions

Also available in: Atom PDF