Project

General

Profile

Bug #46098

Exception adding host using cephadm

Added by Mark Kirkwood about 2 months ago. Updated 8 days ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm (binary)
Target version:
% Done:

0%

Source:
Tags:
low-hanging-fruit
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

After bootstrapping 1st host using cephadm, attempting to add another host fails with an exception (variable referencing error).

Environment:
Ceph version: 15.2.2 (node 1 installed with cephadm)
OS Version: Ubuntu 18.04
Docker Version: 19.03.6-0ubuntu1~18.04.1

Step(s) to reproduce:

ceph0 $ ceph orch host add ceph1  # observe with `ceph -W cephadm`

020-06-19T13:44:42.435058+1200 mgr.ceph0.ufqhzn [ERR] _Promise failed
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 457, in do_work
    res = self._on_complete_(*args, **kwargs)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 525, in <lambda>
    return cls(on_complete=lambda x: f(*x), value=args, name=name, **c_kwargs)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1685, in add_host
    if code:
orchestrator._interface.OrchestratorError: New host ceph1 (ceph1) failed check: ['Traceback (most recent call last):', '  File "<stdin>", line 4580, in <module>', '  File "<stdin>", line 3592, in command_check_host', "UnboundLocalError: local variable 'container_path' referenced before assignment"]

It looks like the variable 'container_path' needs to be indicated as 'global' in command_check_host (see patch). Making this change in the mgr container (and restarting) gets a successful host add.

cephadm.patch View - Add global for container_path (296 Bytes) Mark Kirkwood, 06/19/2020 03:17 AM


Related issues

Related to Orchestrator - Bug #46547: cephadm: Exception adding host via FQDN if host was already added Need More Info
Related to Orchestrator - Tasks #46551: cephadm: Add better a better hint how to add a host In Progress
Duplicated by Orchestrator - Bug #46132: cephadm: Failed to add host in cephadm through command 'ceph orch host add node1' Duplicate

History

#1 Updated by Mark Kirkwood about 2 months ago

Typo in 'Environment' section. 15.2.3 not 15.2.2

#2 Updated by Sebastian Wagner about 2 months ago

  • Category set to cephadm (binary)
  • Status changed from New to Triaged
  • Priority changed from Normal to High
  • Tags set to low-hanging-fruit

#3 Updated by Sebastian Wagner about 2 months ago

  • Assignee set to Stephan Müller

#4 Updated by Dan Mick about 2 months ago

lol, just discovered this myself. Confirm that the suggested fix is appropriate.

#5 Updated by Stephan Müller about 2 months ago

  • Status changed from Triaged to In Progress

#6 Updated by Sebastian Wagner about 1 month ago

  • Duplicated by Bug #46132: cephadm: Failed to add host in cephadm through command 'ceph orch host add node1' added

#7 Updated by Stephan Müller 24 days ago

  • Related to Bug #46547: cephadm: Exception adding host via FQDN if host was already added added

#8 Updated by Stephan Müller 24 days ago

  • Related to Tasks #46551: cephadm: Add better a better hint how to add a host added

#9 Updated by Stephan Müller 24 days ago

  • Status changed from In Progress to Need More Info

I was not yet able to reproduce it. (Tried a lot of things.)

I added new hosts to bootstrapped clusters, removed and added the same host with different synonyms (hostname / IP / FQDN) and much more like changing the hostname.

@Mark Kirkwood and @Dan Mick, could you provide more information on how to reproduce it and how your setup looks like?

#10 Updated by Stephan Müller 24 days ago

  • Priority changed from High to Normal

#11 Updated by Mark Kirkwood 24 days ago

@Stephan Müller, I'd suggest starting with some freshly built VMs (mine were Ubuntu 18.04). Optionally set up the Ceph repos on all of them to get Octopus (I did this). Also I didn't have these VMs in DNS (so just set up /etc/hosts on each of them)

Then:
  • download cephadmin on the host to-be the mon
  • bootstrap it as a mon
  • install ceph-common
  • copy ssh-id to next host
  • add it (hopefully triggering the bug)

Offhand I can't recall if I installed docker on the bootstrap host before bootstrap

#12 Updated by Paul Cuzner 20 days ago

I've hit this with base RHEL 8.2 physical hosts. In my case the new hosts I tried to add didn't have python3, lvm or podman installed - they were fresh "server" role installs.

#13 Updated by Sebastian Wagner 19 days ago

  • Status changed from Need More Info to Fix Under Review
  • Assignee changed from Stephan Müller to Sebastian Wagner
  • Pull request ID set to 36216

#14 Updated by Victor Moreno 16 days ago

I have hit this aswell installing cephadm on debian 10 buster with an apt upgrade done.

I have the playbook that configures the ceph-cluster here if can help:

https://github.com/VictorMorenoJimenez/tfg2020/blob/master/ansible/playbooks/configure_ceph_cluster.yml

#15 Updated by Tobias Fischer 10 days ago

same here. trying to add a fresh debian buster VM with all updates installed (no additional packages like docker present):

.:~# ceph orch host add rgw1
Error ENOENT: New host rgw1 (rgw1) failed check: ['Traceback (most recent call last):', ' File "<stdin>", line 4762, in <module>', ' File "<stdin>", line 3738, in command_check_host', "UnboundLocalError: local variable 'container_path' referenced before assignment"]

but fix proposed by Mark Kirkwood was already there. After Installing Docker on rgw1 error was gone.

#16 Updated by Sebastian Wagner 8 days ago

  • Status changed from Fix Under Review to Resolved
  • Target version set to v15.2.5

Also available in: Atom PDF