Actions
Bug #57976
openceph-volume lvm activate removes /var/lib/ceph/osd/ceph-XXX folder and then chokes on itself
Status:
New
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
When I create a new OSD and try to activate it, the activation step removes the@ /var/lib/ceph/osd/ceph-XXX@ mount folder (that was previously created by the prepare
command) and then fails to activate the OSD.
This issue occurs when I try to add new (SSD) OSDs to my cluster.
Steps to reproduce:
ceph-volume lvm zap /dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:12:0 --> Zapping: /dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:12:0 --> --destroy was not specified, but zapping a whole device will remove the partition table Running command: /usr/bin/dd if=/dev/zero of=/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:12:0 bs=1M count=10 conv=fsync stderr: 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 0.0216828 s, 484 MB/s --> Zapping successful for: <Raw Device: /dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:12:0> ceph-volume lvm prepare --data /dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:12:0 Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 4f054fea-ea30-4966-a886-801cb14b62f9 Running command: vgcreate --force --yes ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185 /dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:12:0 stdout: Physical volume "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:12:0" successfully created. stdout: Volume group "ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185" successfully created Running command: lvcreate --yes -l 228928 -n osd-block-4f054fea-ea30-4966-a886-801cb14b62f9 ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185 stdout: Logical volume "osd-block-4f054fea-ea30-4966-a886-801cb14b62f9" created. Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1255 --> Executable selinuxenabled not in PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185/osd-block-4f054fea-ea30-4966-a886-801cb14b62f9 Running command: /usr/bin/chown -R ceph:ceph /dev/dm-16 Running command: /usr/bin/ln -s /dev/ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185/osd-block-4f054fea-ea30-4966-a886-801cb14b62f9 /var/lib/ceph/osd/ceph-1255/block Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1255/activate.monmap stderr: got monmap epoch 29 Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-1255/keyring --create-keyring --name osd.1255 --add-key XXXX stdout: creating /var/lib/ceph/osd/ceph-1255/keyring added entity osd.1255 auth(key=XXXX) Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1255/keyring Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1255/ Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 1255 --monmap /var/lib/ceph/osd/ceph-1255/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-1255/ --osd-uuid 4f054fea-ea30-4966-a886-801cb14b62f9 --setuser ceph --setgroup ceph --> ceph-volume lvm prepare successful for: /dev/disk/by-path/pci-0000:02:00.0-scsi-0:0:12:0
At this point, the OSD tmpfs exists and contains a lot of stuff:
ls /var/lib/ceph/osd/ceph-1255 activate.monmap bfm_blocks bfm_blocks_per_key bfm_bytes_per_block bfm_size block bluefs ceph_fsid fsid keyring kv_backend magic mkfs_done osd_key ready type whoami
But running ceph-volume lvm activate
removes the folder and then fails to start the new OSD:
ceph-volume lvm activate --all --> OSD ID 763 FSID 93cd5072-8097-4c32-b843-762e6c9bd46f process is active. Skipping activation --> OSD ID 3 FSID ccf0a00b-b0e7-4963-b6f6-0e34342abb0a process is active. Skipping activation --> OSD ID 613 FSID 81dea1c7-0ae8-4091-8339-024f8723dd29 process is active. Skipping activation --> OSD ID 78 FSID 521fa96f-d0af-4106-b6d3-5e3048b8199e process is active. Skipping activation --> OSD ID 1002 FSID e42e678d-334d-4337-8b7e-0c9fd9eaca6e process is active. Skipping activation --> OSD ID 1169 FSID cfae82e0-8fc7-4ee1-965b-d85d3360ef7a process is active. Skipping activation --> OSD ID 687 FSID 49442d45-75e5-47fc-bf26-2ccdef5fbaef process is active. Skipping activation --> OSD ID 157 FSID 749d55ad-6315-4fdf-952f-25984be7552a process is active. Skipping activation --> OSD ID 461 FSID 412673e6-21a1-44d8-8f86-9a4e7d53c206 process is active. Skipping activation --> OSD ID 384 FSID b9b7ac14-c9e9-4bc7-b00b-6984adce0753 process is active. Skipping activation --> OSD ID 841 FSID 6657df87-86e5-4d2c-962b-fafd5a1f9d30 process is active. Skipping activation --> OSD ID 921 FSID 9627acbd-c73d-405b-8485-c25dbaa91c99 process is active. Skipping activation --> OSD ID 1086 FSID 0151a4b8-de7b-401c-af25-5fb167875363 process is active. Skipping activation --> OSD ID 539 FSID 5b94f35d-d09d-4a02-909c-6857684d1e41 process is active. Skipping activation --> OSD ID 240 FSID ba3c9f42-45d4-4338-bd2b-b56dd8ea2ddb process is active. Skipping activation --> Activating OSD ID 1255 FSID 4f054fea-ea30-4966-a886-801cb14b62f9 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1255 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185/osd-block-4f054fea-ea30-4966-a886-801cb14b62f9 --path /var/lib/ceph/osd/ceph-1255 --no-mon-config Running command: /usr/bin/ln -snf /dev/ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185/osd-block-4f054fea-ea30-4966-a886-801cb14b62f9 /var/lib/ceph/osd/ceph-1255/block Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1255/block Running command: /usr/bin/chown -R ceph:ceph /dev/dm-16 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1255 Running command: /usr/bin/systemctl enable ceph-volume@lvm-1255-4f054fea-ea30-4966-a886-801cb14b62f9 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-1255-4f054fea-ea30-4966-a886-801cb14b62f9.service → /lib/systemd/system/ceph-volume@.service. Running command: /usr/bin/systemctl enable --runtime ceph-osd@1255 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@1255.service → /lib/systemd/system/ceph-osd@.service. Running command: /usr/bin/systemctl start ceph-osd@1255 stderr: Job for ceph-osd@1255.service failed because the control process exited with error code. See "systemctl status ceph-osd@1255.service" and "journalctl -xe" for details. --> RuntimeError: command returned non-zero exit status: 1
Folder is gone:
ls /var/lib/ceph/osd/ceph-1255 ls: cannot access '/var/lib/ceph/osd/ceph-1255': No such file or directory
Creating the folder manually fixes the problem:
mkdir /var/lib/ceph/osd/ceph-1255 ceph-volume lvm activate --all --> OSD ID 763 FSID 93cd5072-8097-4c32-b843-762e6c9bd46f process is active. Skipping activation --> OSD ID 3 FSID ccf0a00b-b0e7-4963-b6f6-0e34342abb0a process is active. Skipping activation --> OSD ID 613 FSID 81dea1c7-0ae8-4091-8339-024f8723dd29 process is active. Skipping activation --> OSD ID 78 FSID 521fa96f-d0af-4106-b6d3-5e3048b8199e process is active. Skipping activation --> OSD ID 1002 FSID e42e678d-334d-4337-8b7e-0c9fd9eaca6e process is active. Skipping activation --> OSD ID 1169 FSID cfae82e0-8fc7-4ee1-965b-d85d3360ef7a process is active. Skipping activation --> OSD ID 687 FSID 49442d45-75e5-47fc-bf26-2ccdef5fbaef process is active. Skipping activation --> OSD ID 157 FSID 749d55ad-6315-4fdf-952f-25984be7552a process is active. Skipping activation --> OSD ID 461 FSID 412673e6-21a1-44d8-8f86-9a4e7d53c206 process is active. Skipping activation --> OSD ID 384 FSID b9b7ac14-c9e9-4bc7-b00b-6984adce0753 process is active. Skipping activation --> OSD ID 841 FSID 6657df87-86e5-4d2c-962b-fafd5a1f9d30 process is active. Skipping activation --> OSD ID 921 FSID 9627acbd-c73d-405b-8485-c25dbaa91c99 process is active. Skipping activation --> OSD ID 1086 FSID 0151a4b8-de7b-401c-af25-5fb167875363 process is active. Skipping activation --> OSD ID 539 FSID 5b94f35d-d09d-4a02-909c-6857684d1e41 process is active. Skipping activation --> OSD ID 240 FSID ba3c9f42-45d4-4338-bd2b-b56dd8ea2ddb process is active. Skipping activation --> Activating OSD ID 1255 FSID 4f054fea-ea30-4966-a886-801cb14b62f9 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1255 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185/osd-block-4f054fea-ea30-4966-a886-801cb14b62f9 --path /var/lib/ceph/osd/ceph-1255 --no-mon-config Running command: /usr/bin/ln -snf /dev/ceph-e7ca3aa6-4794-459c-9dcd-b8973f53b185/osd-block-4f054fea-ea30-4966-a886-801cb14b62f9 /var/lib/ceph/osd/ceph-1255/block Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1255/block Running command: /usr/bin/chown -R ceph:ceph /dev/dm-16 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1255 Running command: /usr/bin/systemctl enable ceph-volume@lvm-1255-4f054fea-ea30-4966-a886-801cb14b62f9 Running command: /usr/bin/systemctl enable --runtime ceph-osd@1255 Running command: /usr/bin/systemctl start ceph-osd@1255 --> ceph-volume lvm activate successful for osd ID: 1255 --> OSD ID 314 FSID 7d72f481-bda9-4c5f-b9a8-36220ac8a461 process is active. Skipping activation ls /var/lib/ceph/osd/ceph-1255 block ceph_fsid fsid keyring ready require_osd_release type whoami
ceph-volume create
, which bundles the two-step process suffers from the same issue.
Updated by Janek Bevendorff over 1 year ago
Looks like the problem is gone after a full reboot. No idea what was going on, but it was reproducible on all nodes.
Actions