Project

General

Profile

Actions

Bug #57016

closed

cephadm bootstrap begs robustness

Added by greg mott over 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm (binary)
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

How do i just start over?

First time i tried cephadm bootstrap and it failed i guessed it wasn't good with PermitRootLogin set to no in /etc/ssh/sshd_config so i set that to yes, and also thought it best to upgrade all packages and reboot.

Next try it complained that /etc/ceph/ceph.conf already exists, and then /etc/ceph/ceph.client.admin.keyring, and then /etc/ceph/ceph.pub, so i moved them to a souvenir bin.

Next it complained "Cannot bind to IP 192.168.176.13 port 3300: [Errno 98] Address already in use", so i killed off the ceph processes and tried again.

Next it complained "Waiting for mon to start... Waiting for mon...
Non-zero exit code 13 from /usr/bin/podman run --rm--ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=tangelo -e CEPH_USE_RANDOM_NONCE=1 -v /var/lib/ceph/d0e93cf8-12a0-11ed-83c2-8cdcd4320acb/mon.tangelo:/var/lib/ceph/mon/ceph-tangelo:z -v /tmp/ceph-tmp0cemkjph:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpkh0pixyn:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 status
/usr/bin/ceph: stderr 2022-08-02T20:23:07.594+0000 7f3ece419700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
/usr/bin/ceph: stderr [errno 13] RADOS permission denied (error connecting to the cluster)
ERROR: mon not available after 15 tries"

How do i just start over?


Related issues 1 (0 open1 closed)

Copied to Orchestrator - Backport #61963: reef: cephadm bootstrap begs robustnessResolvedAdam KingActions
Actions #1

Updated by greg mott over 1 year ago

this is on alma(rhel) 8.6
the commands i've given are

curl --silent --remote-name --location https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm
chmod +x cephadm
cephadm bootstrap --mon-ip 192.168...

Actions #2

Updated by greg mott over 1 year ago

gave these commands also before cephadm bootstrap

./cephadm add-repo --release quincy
./cephadm install

Actions #3

Updated by Dhairya Parmar over 1 year ago

greg mott wrote:

How do i just start over?

First time i tried cephadm bootstrap and it failed i guessed it wasn't good with PermitRootLogin set to no in /etc/ssh/sshd_config so i set that to yes, and also thought it best to upgrade all packages and reboot.

Next try it complained that /etc/ceph/ceph.conf already exists, and then /etc/ceph/ceph.client.admin.keyring, and then /etc/ceph/ceph.pub, so i moved them to a souvenir bin.

Next it complained "Cannot bind to IP 192.168.176.13 port 3300: [Errno 98] Address already in use", so i killed off the ceph processes and tried again.

Next it complained "Waiting for mon to start... Waiting for mon...
Non-zero exit code 13 from /usr/bin/podman run --rm--ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=tangelo -e CEPH_USE_RANDOM_NONCE=1 -v /var/lib/ceph/d0e93cf8-12a0-11ed-83c2-8cdcd4320acb/mon.tangelo:/var/lib/ceph/mon/ceph-tangelo:z -v /tmp/ceph-tmp0cemkjph:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpkh0pixyn:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 status
/usr/bin/ceph: stderr 2022-08-02T20:23:07.594+0000 7f3ece419700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
/usr/bin/ceph: stderr [errno 13] RADOS permission denied (error connecting to the cluster)
ERROR: mon not available after 15 tries"

How do i just start over?

Did you try running "lsof -i :3300" or maybe "nmap -p 3300 192.168.176.13"? Actually I do have ran into similar issue and the solution has always been to make sure those processes are not running anymore. Did you try restarting that machine?

Actions #4

Updated by greg mott over 1 year ago

yes indeed, i restarted the machine, killed the ceph processes, and then it complained "...mon not available..."

How do i just start over?

Actions #5

Updated by greg mott over 1 year ago

Ok i've worked out how to "start over":
Just move the following files to a souvenir bin (or delete them): /etc/{ceph,logr*,sy*/sy*/{,/mu*}}/ceph*
Then restart, and reissue the command: cephadm bootstrap --mon-ip 192.168...
So with that i've now got "Bootstrap complete."

Actions #6

Updated by Dhairya Parmar over 1 year ago

greg mott wrote:

Ok i've worked out how to "start over":
Just move the following files to a souvenir bin (or delete them): /etc/{ceph,logr*,sy*/sy*/{,/mu*}}/ceph*
Then restart, and reissue the command: cephadm bootstrap --mon-ip 192.168...
So with that i've now got "Bootstrap complete."

Good to know it finally worked out for you.

Actions #7

Updated by Dhairya Parmar over 1 year ago

  • Status changed from New to Resolved
Actions #8

Updated by Dhairya Parmar over 1 year ago

Changing the status to resolved. You can re-open if you hit something again

Actions #9

Updated by Adam King over 1 year ago

  • Project changed from Ceph to Orchestrator
  • Status changed from Resolved to New

going to re-open this and move to orchestrator component for tracking some mechanism in cephadm for helping users with cleaning up failed bootstrap attempts since currently they're sort of on their own for figuring out what to clean up.

Actions #10

Updated by Redouane Kachach Elhichou over 1 year ago

Maybe what happened is basically the ceph cluster was partially installed. In this case, what you have to do is to remove the faulty cluster by using the rm-cluster:

cephadm rm-cluster --force --zap-osds --fsid <fsid>

You can get the fsid from the conf file (/etc/ceph/ceph.conf) or from the boostrap logs, look for a line similar to:

Cluster fsid: 72ed6c66-1d4d-11ed-96f4-5254001aecae

This should cleanup all the files that were installed as part of the boostrap.

More info on:
https://docs.ceph.com/en/latest/cephadm/operations/#purging-a-cluster

Actions #11

Updated by Redouane Kachach Elhichou over 1 year ago

  • Category set to cephadm (binary)
Actions #12

Updated by Redouane Kachach Elhichou 11 months ago

  • Status changed from New to In Progress
  • Assignee set to Redouane Kachach Elhichou
Actions #13

Updated by Redouane Kachach Elhichou 11 months ago

  • Pull request ID set to 51718
Actions #14

Updated by Redouane Kachach Elhichou 10 months ago

  • Status changed from In Progress to Fix Under Review
Actions #15

Updated by Adam King 10 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to reef
Actions #16

Updated by Backport Bot 10 months ago

Actions #17

Updated by Backport Bot 10 months ago

  • Tags set to backport_processed
Actions #18

Updated by Redouane Kachach Elhichou 3 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF