Project

General

Profile

Actions

Bug #55395

open

Problem with ceph cephadm osd activate

Added by Manuel Holtgrewe about 2 years ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I reinstalled the operating system of my host. I then wanted to use `ceph cephadm osd activate` for bringing the osds in the host back into my cluster.

The `ceph cephadm osd activate` only brought back the crash and the node-exporter docker containers.

- Mailing List Thread: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ALGZCKU7B4NUUBCE5NDDBANTONFWWDDR/

Following hints on the mailing list, I called `cephadm -v -v -v ceph-volume lvm activate --all --no-systemd`. This did not create any folders in `/var/lib/ceph/FSID` so `systemctl start ` fails with `/bin/bash: /var/lib/ceph/d221bc3c-8ff4-11ec-b4ba-b02628267680/osd.12/unit.run: No such file or directory`.

Output of cephadm ceph-volume lvm activate --all --nosystemd is below.

# cephadm -v -v -v ceph-volume lvm activate --all --no-systemd 
--------------------------------------------------------------------------------
cephadm ['-v', '-v', '-v', 'ceph-volume', 'lvm', 'activate', '--all', '--no-systemd']
Using default config: /etc/ceph/ceph.conf
/bin/docker: 920d72f71171,79MiB / 251.5GiB
/bin/docker: 72570e59f28e,77.43MiB / 251.5GiB
/bin/docker: b1a0de17c02d,854.8MiB / 251.5GiB
/bin/docker: cce740906d36,1.158GiB / 251.5GiB
/bin/docker: 4fba9fb86122,209.3MiB / 251.5GiB
/bin/docker: 9b06ce4f696b,96.83MiB / 251.5GiB
/bin/docker: 35269d267dfc,91.14MiB / 251.5GiB
/bin/docker: 54a2a92d68fa,278.1MiB / 251.5GiB
/bin/docker: cee7af4d922d,871.7MiB / 251.5GiB
/bin/docker: f821b8d5e473,627.6MiB / 251.5GiB
/bin/docker: 805061ed9471,473.3MiB / 251.5GiB
/bin/docker: 52e339e96d47,451.7MiB / 251.5GiB
/bin/docker: 542bda96ee76,409.9MiB / 251.5GiB
/bin/docker: 1d76652acb43,192MiB / 251.5GiB
/bin/docker: 0934566689b5,120.9MiB / 251.5GiB
/bin/docker: 17c036fbb15d,285MiB / 251.5GiB
/bin/docker: 63d0925cfc81,1.647GiB / 251.5GiB
/bin/docker: 06e96a3e5cbd,572.6MiB / 251.5GiB
/bin/docker: 2d2fc47ec89a,19.6MiB / 251.5GiB
/bin/docker: 26adae09c237,233.6MiB / 251.5GiB
/bin/docker: 86cf7b732b13,90.75MiB / 251.5GiB
/bin/docker: 92d366bd19aa,2.008MiB / 251.5GiB
/bin/docker: 01e9dc38faab,208.1MiB / 251.5GiB
/bin/docker: da22e3f7811c,655MiB / 251.5GiB
/bin/docker: d88137996826,783.1MiB / 251.5GiB
/bin/docker: a9f1e48f4a04,585.5MiB / 251.5GiB
/bin/docker: 49742307a5b4,478.1MiB / 251.5GiB
/bin/docker: 2bb7d623ed93,205.7MiB / 251.5GiB
/bin/docker: 0b323f254217,243.5MiB / 251.5GiB
/bin/docker: 8835104b0515,123.5MiB / 251.5GiB
/bin/docker: 67472592f1b4,658.9MiB / 251.5GiB
/bin/docker: 5aef3941d33b,474.4MiB / 251.5GiB
/bin/docker: 0d571f24c0e9,100.1MiB / 251.5GiB
/bin/docker: 7e706a832e4a,2.11GiB / 251.5GiB
/bin/docker: 23a669b4fd7c,1.164GiB / 251.5GiB
/bin/docker: 74581172b6fb,1.902MiB / 251.5GiB
/bin/docker: 37e980f9f83d,2.273MiB / 251.5GiB
/bin/docker: 858a9b82375d,179.5MiB / 251.5GiB
/bin/docker: dc97abdfc393,16.78MiB / 251.5GiB
/bin/docker: d561d2aa6053,5.488MiB / 251.5GiB
/bin/docker: 804b450b057f,655.5MiB / 251.5GiB
/bin/docker: 011180b66a7a,2.621MiB / 251.5GiB
/bin/docker: 21b984cdc711,56.94MiB / 251.5GiB
/bin/docker: c197971ee2ba,1.754MiB / 251.5GiB
/bin/docker: a10112ce1fa7,1.887MiB / 251.5GiB
/bin/docker: 546536f82ada,160.5MiB / 251.5GiB
/bin/docker: 5822da42e73b,38.15MiB / 251.5GiB
/bin/docker: 38e044b7d42f,6.645MiB / 251.5GiB
Inferring fsid d221bc3c-8ff4-11ec-b4ba-b02628267680
/bin/docker: quay.io/ceph/ceph@sha256:0d927ccbd8892180ee09894c2b2c26d07c938bf96a56eaee9b80fc9f26083ddb
/bin/docker: quay.io/ceph/ceph@
Using recent ceph image quay.io/ceph/ceph@sha256:0d927ccbd8892180ee09894c2b2c26d07c938bf96a56eaee9b80fc9f26083ddb
stat: 167 167
Acquiring lock 140181136017056 on /run/cephadm/d221bc3c-8ff4-11ec-b4ba-b02628267680.lock
Lock 140181136017056 acquired on /run/cephadm/d221bc3c-8ff4-11ec-b4ba-b02628267680.lock
sestatus: SELinux status:                 disabled
sestatus: SELinux status:                 disabled
/bin/docker: --> Activating OSD ID 12 FSID e2ebb627-28aa-45a3-9261-d7c27bc08448
/bin/docker: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-12
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12
/bin/docker: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-2cbf3973-13a3-444a-b335-a0262cff6074/osd-block-e2ebb627-28aa-45a3-9261-d7c27bc08448 --path /var/lib/ceph/osd/ceph-12 --no-mon-config
/bin/docker: Running command: /usr/bin/ln -snf /dev/ceph-2cbf3973-13a3-444a-b335-a0262cff6074/osd-block-e2ebb627-28aa-45a3-9261-d7c27bc08448 /var/lib/ceph/osd/ceph-12/block
/bin/docker: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-12/block
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-3
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12
/bin/docker: --> ceph-volume lvm activate successful for osd ID: 12
/bin/docker: --> Activating OSD ID 25 FSID 3f3d61f8-6964-4922-98cb-6620aff5cb6f
/bin/docker: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-25
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-25
/bin/docker: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-6f69c24e-2930-48ff-a18f-278470e558e1/osd-block-3f3d61f8-6964-4922-98cb-6620aff5cb6f --path /var/lib/ceph/osd/ceph-25 --no-mon-config
/bin/docker: Running command: /usr/bin/ln -snf /dev/ceph-6f69c24e-2930-48ff-a18f-278470e558e1/osd-block-3f3d61f8-6964-4922-98cb-6620aff5cb6f /var/lib/ceph/osd/ceph-25/block
/bin/docker: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-25/block
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-6
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-25
/bin/docker: --> ceph-volume lvm activate successful for osd ID: 25
/bin/docker: --> Activating OSD ID 0 FSID 8d0b1bad-069a-4acf-b13b-982fab58f285
/bin/docker: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
/bin/docker: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-8d9bd6eb-ce97-4940-8fed-f57f7bed3f5a/osd-block-8d0b1bad-069a-4acf-b13b-982fab58f285 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
/bin/docker: Running command: /usr/bin/ln -snf /dev/ceph-8d9bd6eb-ce97-4940-8fed-f57f7bed3f5a/osd-block-8d0b1bad-069a-4acf-b13b-982fab58f285 /var/lib/ceph/osd/ceph-0/block
/bin/docker: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
/bin/docker: --> ceph-volume lvm activate successful for osd ID: 0
/bin/docker: --> Activating OSD ID 8 FSID ff82d0d0-6d55-49c2-85cc-bb8a0a74ae89
/bin/docker: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-8
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-8
/bin/docker: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-a8c3f39b-ec8d-4dd5-a85a-13a5773b99fa/osd-block-ff82d0d0-6d55-49c2-85cc-bb8a0a74ae89 --path /var/lib/ceph/osd/ceph-8 --no-mon-config
/bin/docker: Running command: /usr/bin/ln -snf /dev/ceph-a8c3f39b-ec8d-4dd5-a85a-13a5773b99fa/osd-block-ff82d0d0-6d55-49c2-85cc-bb8a0a74ae89 /var/lib/ceph/osd/ceph-8/block
/bin/docker: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-8/block
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-2
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-8
/bin/docker: --> ceph-volume lvm activate successful for osd ID: 8
/bin/docker: --> Activating OSD ID 4 FSID 313160a8-594d-4384-9640-68c4d8c1b6da
/bin/docker: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-4
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4
/bin/docker: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-bf94d5a5-eb04-42e3-867a-dee2886daa62/osd-block-313160a8-594d-4384-9640-68c4d8c1b6da --path /var/lib/ceph/osd/ceph-4 --no-mon-config
/bin/docker: Running command: /usr/bin/ln -snf /dev/ceph-bf94d5a5-eb04-42e3-867a-dee2886daa62/osd-block-313160a8-594d-4384-9640-68c4d8c1b6da /var/lib/ceph/osd/ceph-4/block
/bin/docker: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-4/block
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4
/bin/docker: --> ceph-volume lvm activate successful for osd ID: 4
/bin/docker: --> Activating OSD ID 20 FSID f7e67343-4fde-4e45-bc70-f44c92a178bd
/bin/docker: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-20
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-20
/bin/docker: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-ee2517dc-ecb6-4c14-ab22-59d9c54f0952/osd-block-f7e67343-4fde-4e45-bc70-f44c92a178bd --path /var/lib/ceph/osd/ceph-20 --no-mon-config
/bin/docker: Running command: /usr/bin/ln -snf /dev/ceph-ee2517dc-ecb6-4c14-ab22-59d9c54f0952/osd-block-f7e67343-4fde-4e45-bc70-f44c92a178bd /var/lib/ceph/osd/ceph-20/block
/bin/docker: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-20/block
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5
/bin/docker: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-20
/bin/docker: --> ceph-volume lvm activate successful for osd ID: 20

Actions #1

Updated by Janek Bevendorff 7 months ago

Any progress on this? Is this fixed in newer Ceph versions? I'm migrating our Pacfic cluster too cephadm + salt, so I can upgrade to a newer Ceph version and I need to run cephadm ceph-volume activate --all after booting the nodes.

Actions #2

Updated by Janek Bevendorff 6 months ago

Until this is fixed, I have the following Python script in place of ceph-volume lvm activate --all:

#!/usr/bin/env python3

import json
import os
from subprocess import check_output, run
import sys

osd_list = json.loads(check_output(['cephadm', 'ceph-volume', 'lvm', 'list', '--format=json']))
for osd_id, osd_info in osd_list.items():
    fsid = osd_info[0]['tags']['ceph.cluster_fsid']
    osd_fsid = osd_info[0]['tags']['ceph.osd_fsid']

    print(f'Activating {osd_id}...', file=sys.stderr)
    if not os.path.exists(f'/var/lib/ceph/{fsid}/osd.{osd_id}'):
        run(['cephadm', 'deploy', '--name', f'osd.{osd_id}', '--fsid', fsid, '--osd-fsid', osd_fsid, '--config', '/etc/ceph/ceph.conf'])
    else:
        run(['cephadm', 'unit', '--name', f'osd.{osd_id}', '--fsid', fsid, 'start'])
Actions #3

Updated by Eugen Block 6 months ago

I'm not sure if this is really fixed, but I tried to reproduce it and failed with the 18.2.0 version. I have a small virtual test cluster. I deleted one of the VMs which had two OSDs, built a new VM with the same IP to imitate a redeployed host and attached the already existing OSDs to it. After a couple of minutes cephadm detected the OSDs automatically and was able to start the OSD pods without my intervention. The only thing I did before the host was successfully reactivated was to create the /var/lib/ceph/{fsid} directory in order to be able to start the MON. I didn't have a quorum so I tried to manually deploy a MON daemon which failed for several reasons. I kept the directory but decided to change the monmap of the remaining host to have only one mon in the monmap so I could get the cluster back up. In this small test cluster (only 3 OSDs) I have the all-available-devices config enabled, but that shouldn't make a difference.

Actions #4

Updated by Eugen Block 6 months ago

I repeated the test without having a MON on the removed host. This time I didn't create a directory manually. After having the host prepared (installed cephadm, copied configs, ssh access etc.) cephadm automatically created the required directory /var/lib/ceph/{fsid} and deployed the first daemons (mgr and crash). I waited for only a few minutes, the OSDs weren't picked up yet. But running ceph cephadm osd activate {host} was successful (deployed with quincy, hence the hostnames):

quincy-1:~ # ceph cephadm osd activate quincy-2
Created osd(s) 0,3 on host 'quincy-2'

The cluster is healthy now. I would consider it fixed with Reef, I'll try to reproduce with quincy and pacific as well.

Actions #5

Updated by Eugen Block 6 months ago

The same works for me with Pacific 16.2.14 and Quincy 17.2.6.

Actions

Also available in: Atom PDF