Project

General

Profile

Actions

Bug #46098

closed

Exception adding host using cephadm

Added by Mark Kirkwood almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm (binary)
Target version:
% Done:

0%

Source:
Tags:
low-hanging-fruit
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After bootstrapping 1st host using cephadm, attempting to add another host fails with an exception (variable referencing error).

Environment:
Ceph version: 15.2.2 (node 1 installed with cephadm)
OS Version: Ubuntu 18.04
Docker Version: 19.03.6-0ubuntu1~18.04.1

Step(s) to reproduce:

ceph0 $ ceph orch host add ceph1  # observe with `ceph -W cephadm`

020-06-19T13:44:42.435058+1200 mgr.ceph0.ufqhzn [ERR] _Promise failed
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 457, in do_work
    res = self._on_complete_(*args, **kwargs)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 525, in <lambda>
    return cls(on_complete=lambda x: f(*x), value=args, name=name, **c_kwargs)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1685, in add_host
    if code:
orchestrator._interface.OrchestratorError: New host ceph1 (ceph1) failed check: ['Traceback (most recent call last):', '  File "<stdin>", line 4580, in <module>', '  File "<stdin>", line 3592, in command_check_host', "UnboundLocalError: local variable 'container_path' referenced before assignment"]

It looks like the variable 'container_path' needs to be indicated as 'global' in command_check_host (see patch). Making this change in the mgr container (and restarting) gets a successful host add.


Files

cephadm.patch (296 Bytes) cephadm.patch Add global for container_path Mark Kirkwood, 06/19/2020 03:17 AM

Related issues 3 (0 open3 closed)

Related to Orchestrator - Support #46547: cephadm: Exception adding host via FQDN if host was already addedResolved

Actions
Related to Orchestrator - Tasks #46551: cephadm: Add better a better hint how to add a hostResolved

Actions
Has duplicate Orchestrator - Bug #46132: cephadm: Failed to add host in cephadm through command 'ceph orch host add node1'Duplicate

Actions
Actions #1

Updated by Mark Kirkwood almost 4 years ago

Typo in 'Environment' section. 15.2.3 not 15.2.2

Actions #2

Updated by Sebastian Wagner almost 4 years ago

  • Category set to cephadm (binary)
  • Status changed from New to Triaged
  • Priority changed from Normal to High
  • Tags set to low-hanging-fruit
Actions #3

Updated by Sebastian Wagner almost 4 years ago

  • Assignee set to Stephan Müller
Actions #4

Updated by Dan Mick almost 4 years ago

lol, just discovered this myself. Confirm that the suggested fix is appropriate.

Actions #5

Updated by Stephan Müller almost 4 years ago

  • Status changed from Triaged to In Progress
Actions #6

Updated by Sebastian Wagner almost 4 years ago

  • Has duplicate Bug #46132: cephadm: Failed to add host in cephadm through command 'ceph orch host add node1' added
Actions #7

Updated by Stephan Müller almost 4 years ago

  • Related to Support #46547: cephadm: Exception adding host via FQDN if host was already added added
Actions #8

Updated by Stephan Müller almost 4 years ago

  • Related to Tasks #46551: cephadm: Add better a better hint how to add a host added
Actions #9

Updated by Stephan Müller almost 4 years ago

  • Status changed from In Progress to Need More Info

I was not yet able to reproduce it. (Tried a lot of things.)

I added new hosts to bootstrapped clusters, removed and added the same host with different synonyms (hostname / IP / FQDN) and much more like changing the hostname.

@Mark Kirkwood and @Dan Mick, could you provide more information on how to reproduce it and how your setup looks like?

Actions #10

Updated by Stephan Müller almost 4 years ago

  • Priority changed from High to Normal
Actions #11

Updated by Mark Kirkwood almost 4 years ago

@Stephan Hohn Müller, I'd suggest starting with some freshly built VMs (mine were Ubuntu 18.04). Optionally set up the Ceph repos on all of them to get Octopus (I did this). Also I didn't have these VMs in DNS (so just set up /etc/hosts on each of them)

Then:
  • download cephadmin on the host to-be the mon
  • bootstrap it as a mon
  • install ceph-common
  • copy ssh-id to next host
  • add it (hopefully triggering the bug)

Offhand I can't recall if I installed docker on the bootstrap host before bootstrap

Actions #12

Updated by Paul Cuzner almost 4 years ago

I've hit this with base RHEL 8.2 physical hosts. In my case the new hosts I tried to add didn't have python3, lvm or podman installed - they were fresh "server" role installs.

Actions #13

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from Need More Info to Fix Under Review
  • Assignee changed from Stephan Müller to Sebastian Wagner
  • Pull request ID set to 36216
Actions #14

Updated by Victor Moreno almost 4 years ago

I have hit this aswell installing cephadm on debian 10 buster with an apt upgrade done.

I have the playbook that configures the ceph-cluster here if can help:

https://github.com/VictorMorenoJimenez/tfg2020/blob/master/ansible/playbooks/configure_ceph_cluster.yml

Actions #15

Updated by Tobias Fischer over 3 years ago

same here. trying to add a fresh debian buster VM with all updates installed (no additional packages like docker present):

.:~# ceph orch host add rgw1
Error ENOENT: New host rgw1 (rgw1) failed check: ['Traceback (most recent call last):', ' File "<stdin>", line 4762, in <module>', ' File "<stdin>", line 3738, in command_check_host', "UnboundLocalError: local variable 'container_path' referenced before assignment"]

but fix proposed by Mark Kirkwood was already there. After Installing Docker on rgw1 error was gone.

Actions #16

Updated by Sebastian Wagner over 3 years ago

  • Status changed from Fix Under Review to Resolved
  • Target version set to v15.2.5
Actions

Also available in: Atom PDF