Project

General

Profile

Actions

Documentation #58468

open

cephadm installation guide -- refine and correct

Added by Zac Dover over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

<trevorksmith> zdover, I am following these instructions. - https://docs.ceph.com/en/quincy/cephadm/install/ These are the commands i ended up running, as going through the instructions. https://pastebin.com/raw/BCXwr0qi
<trevorksmith> If i try to add a osd manually since i can see it from the inventory output, i get the following errors - https://pastebin.com/raw/ZavMY66y
Actions #1

Updated by Zac Dover over 1 year ago

Ubuntu Jammy | Purged all docker/ceph packages and files from system. Starting from scratch.

Following: https://docs.ceph.com/en/quincy/cephadm/install/

root@RX570:~# apt install docker.io python3 chrony lvm2 systemd -y
root@RX570:~# apt install cephadm -y
root@RX570:~# cephadm bootstrap --mon-ip 192.168.1.227 --single-host-defaults
root@RX570:~# apt install ceph-common -y

root@RX570:~# ceph status
cluster:
id: 2506e1e8-9521-11ed-a3c1-29bb7e5ec517
health: HEALTH_WARN
failed to probe daemons or devices
OSD count 0 < osd_pool_default_size 2

services:
mon: 1 daemons, quorum RX570 (age 2m)
mgr: RX570.fzmkkf(active, since 53s)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
progress:
Updating grafana deployment (+1 -> 1) (0s)
[............................]

At this point, already observing issues, specifically the "failed to probe daemons or devices". Not clear if this is normal or expected from a fresh install. The OSD count seems expected, as i have added none so far and explained in documentation.

root@RX570:~# ceph orch device ls

This returns nothing

root@RX570:~# ceph-volume inventory

Device Path Size rotates available Model name
/dev/sdl 7.28 TB True True USB3.0
/dev/sdm 7.28 TB True True USB3.0
/dev/sdn 7.28 TB True True USB3.0
/dev/sdo 7.28 TB True True USB3.0
/dev/sdp 7.28 TB True True USB3.0
/dev/nvme0n1 1.82 TB False False Samsung SSD 980 PRO 2TB
/dev/sda 3.64 TB False False Samsung SSD 860
/dev/sdb 16.37 TB True False USB3.0
/dev/sdc 16.37 TB True False USB3.0
/dev/sdd 16.37 TB True False USB3.0
/dev/sde 16.37 TB True False USB3.0
/dev/sdf 16.37 TB True False USB3.0
/dev/sdg 16.37 TB True False USB3.0
/dev/sdh 16.37 TB True False USB3.0
/dev/sdi 16.37 TB True False USB3.0
/dev/sdj 16.37 TB True False USB3.0
/dev/sdk 16.37 TB True False USB3.0

Returns the disks i expect, and 5 of them show up as available, so i am confused as to why they are not in the device list.

Not sure where to go from here, spent many hours trying to fix, but hard to understand something i never had working in the first place.

Other commands that can give insight.

root@RX570:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
86762bf56c57 quay.io/ceph/ceph-grafana:8.3.5 "/bin/sh -c 'grafana…" 2 minutes ago Up 2 minutes ceph-2506e1e8-9521-11ed-a3c1-29bb7e5ec517-grafana-RX570
16f57166ffa4 quay.io/prometheus/alertmanager:v0.23.0 "/bin/alertmanager -…" 2 minutes ago Up 2 minutes ceph-2506e1e8-9521-11ed-a3c1-29bb7e5ec517-alertmanager-RX570
332aadf3176f quay.io/prometheus/prometheus:v2.33.4 "/bin/prometheus --c…" 4 minutes ago Up 4 minutes ceph-2506e1e8-9521-11ed-a3c1-29bb7e5ec517-prometheus-RX570
ebf96bc5f9ac quay.io/prometheus/node-exporter:v1.3.1 "/bin/node_exporter …" 5 minutes ago Up 5 minutes ceph-2506e1e8-9521-11ed-a3c1-29bb7e5ec517-node-exporter-RX570
d3618a80ad3e quay.io/ceph/ceph "/usr/bin/ceph-mgr -…" 5 minutes ago Up 5 minutes ceph-2506e1e8-9521-11ed-a3c1-29bb7e5ec517-mgr-RX570-pgqpdc
430c5a4e30b3 quay.io/ceph/ceph "/usr/bin/ceph-crash…" 6 minutes ago Up 5 minutes ceph-2506e1e8-9521-11ed-a3c1-29bb7e5ec517-crash-RX570
45bdf0148317 quay.io/ceph/ceph:v17 "/usr/bin/ceph-mgr -…" 7 minutes ago Up 7 minutes ceph-2506e1e8-9521-11ed-a3c1-29bb7e5ec517-mgr-RX570-fzmkkf
0c3e3e7b2afe quay.io/ceph/ceph:v17 "/usr/bin/ceph-mon -…" 7 minutes ago Up 7 minutes ceph-2506e1e8-9521-11ed-a3c1-29bb7e5ec517-mon-RX570

root@RX570:~# ceph orch host ls
HOST ADDR LABELS STATUS
RX570 192.168.1.227 _admin
1 hosts in cluster

Actions #2

Updated by Zac Dover over 1 year ago

root@RX570:~# ceph orch daemon add osd RX570:/dev/sdl
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1755, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 840, in _daemon_add_osd
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in raise_if_exception
raise e
RuntimeError: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/mon.RX570/config
Non-zero exit code 2 from /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=RX570 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517:/var/run/ceph:z -v /var/log/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517:/var/log/ceph:z -v /var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmpfzc_ux5y:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmp7cty9_0m:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 lvm batch --no-auto /dev/sdl --yes --no-systemd
/usr/bin/docker: stderr usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
/usr/bin/docker: stderr [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
/usr/bin/docker: stderr [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
/usr/bin/docker: stderr [--auto] [--no-auto] [--bluestore] [--filestore]
/usr/bin/docker: stderr [--report] [--yes]
/usr/bin/docker: stderr [--format {json,json-pretty,pretty}] [--dmcrypt]
/usr/bin/docker: stderr [--crush-device-class CRUSH_DEVICE_CLASS]
/usr/bin/docker: stderr [--no-systemd]
/usr/bin/docker: stderr [--osds-per-device OSDS_PER_DEVICE]
/usr/bin/docker: stderr [--data-slots DATA_SLOTS]
/usr/bin/docker: stderr [--data-allocate-fraction DATA_ALLOCATE_FRACTION]
/usr/bin/docker: stderr [--block-db-size BLOCK_DB_SIZE]
/usr/bin/docker: stderr [--block-db-slots BLOCK_DB_SLOTS]
/usr/bin/docker: stderr [--block-wal-size BLOCK_WAL_SIZE]
/usr/bin/docker: stderr [--block-wal-slots BLOCK_WAL_SLOTS]
/usr/bin/docker: stderr [--journal-size JOURNAL_SIZE]
/usr/bin/docker: stderr [--journal-slots JOURNAL_SLOTS] [--prepare]
/usr/bin/docker: stderr [--osd-ids [OSD_IDS [OSD_IDS ...]]]
/usr/bin/docker: stderr [DEVICES [DEVICES ...]]
/usr/bin/docker: stderr ceph-volume lvm batch: error: argument DEVICES: invalid <ceph_volume.util.arg_validators.ValidBatchDataDevice object at 0x7f0db141d588> value: '/dev/sdl'
Traceback (most recent call last):
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 9468, in <module>
main()
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 9456, in main
r = ctx.func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 2083, in _infer_config
return func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1999, in _infer_fsid
return func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 2111, in _infer_image
return func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1986, in _validate_fsid
return func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 6093, in command_ceph_volume
out, err, code = call_throws(ctx, c.run_cmd(), verbosity=CallVerbosity.QUIET_UNLESS_ERROR)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1788, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=RX570 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517:/var/run/ceph:z -v /var/log/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517:/var/log/ceph:z -v /var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmpfzc_ux5y:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmp7cty9_0m:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 lvm batch --no-auto /dev/sdl --yes --no-systemd

Actions #3

Updated by Zac Dover over 1 year ago

root@RX570:~# ceph health detail
HEALTH_WARN failed to probe daemons or devices; OSD count 0 < osd_pool_default_size 2
[WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
host RX570 `cephadm ceph-volume` failed: cephadm exited with an error code: 1, stderr: Inferring config /var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/mon.RX570/config
Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=RX570 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517:/var/run/ceph:z -v /var/log/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517:/var/log/ceph:z -v /var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmpzjllcnij:/etc/ceph/ceph.conf:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 inventory --format=json-pretty --filter-for-batch
/usr/bin/docker: stderr Traceback (most recent call last):
/usr/bin/docker: stderr File "/usr/sbin/ceph-volume", line 11, in <module>
/usr/bin/docker: stderr load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in init
/usr/bin/docker: stderr self.main(self.argv)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
/usr/bin/docker: stderr return f(a, **kw)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
/usr/bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
/usr/bin/docker: stderr instance.main()
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/inventory/main.py", line 53, in main
/usr/bin/docker: stderr with_lsm=self.args.with_lsm))
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 39, in init
/usr/bin/docker: stderr all_devices_vgs = lvm.get_all_devices_vgs()
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 797, in get_all_devices_vgs
/usr/bin/docker: stderr return [VolumeGroup(
*vg) for vg in vgs]
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 797, in <listcomp>
/usr/bin/docker: stderr return [VolumeGroup(**vg) for vg in vgs]
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 517, in init
/usr/bin/docker: stderr raise ValueError('VolumeGroup must have a non-empty name')
/usr/bin/docker: stderr ValueError: VolumeGroup must have a non-empty name
Traceback (most recent call last):
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 9468, in <module>
main()
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 9456, in main
r = ctx.func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 2083, in _infer_config
return func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1999, in _infer_fsid
return func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 2111, in _infer_image
return func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1986, in _validate_fsid
return func(ctx)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 6093, in command_ceph_volume
out, err, code = call_throws(ctx, c.run_cmd(), verbosity=CallVerbosity.QUIET_UNLESS_ERROR)
File "/var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1788, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=RX570 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517:/var/run/ceph:z -v /var/log/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517:/var/log/ceph:z -v /var/lib/ceph/2506e1e8-9521-11ed-a3c1-29bb7e5ec517/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmpzjllcnij:/etc/ceph/ceph.conf:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 inventory --format=json-pretty --filter-for-batch
[WRN] TOO_FEW_OSDS: OSD count 0 < osd_pool_default_size 2

Actions #4

Updated by Zac Dover over 1 year ago

  • Tracker changed from Bug to Documentation
Actions

Also available in: Atom PDF