Bug #55382
openOSD's created from one node in a cluster
0%
Description
OSD's are configured from only one node in the cluster. All nodes are having the same configurations.
Nodes List:++
[root@bruuni006 examples]# kubectl get node NAME STATUS ROLES AGE VERSION bruuni006 Ready control-plane,master 4h37m v1.23.5 bruuni007 Ready <none> 4h36m v1.23.5 bruuni008 Ready <none> 4h35m v1.23.5 bruuni010 Ready <none> 4h35m v1.23.5 [root@bruuni006 examples]# *Node Configurations:*++ [root@bruuni006 examples]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 447.1G 0 disk └─sda1 8:1 0 447.1G 0 part / sdb 8:16 0 894.3G 0 disk sdc 8:32 0 894.3G 0 disk sdd 8:48 0 894.3G 0 disk sde 8:64 0 1.8T 0 disk sdf 8:80 0 1.8T 0 disk nvme0n1 259:0 0 1.5T 0 disk [root@bruuni006 examples]# NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 447.1G 0 disk └─sda1 8:1 0 447.1G 0 part / sdb 8:16 0 894.3G 0 disk sdc 8:32 0 894.3G 0 disk sdd 8:48 0 894.3G 0 disk sde 8:64 0 1.8T 0 disk sdf 8:80 0 1.8T 0 disk nvme0n1 259:0 0 1.5T 0 disk [root@bruuni007 ubuntu]# [root@bruuni010 ubuntu]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 447.1G 0 disk └─sda1 8:1 0 447.1G 0 part / sdb 8:16 0 894.3G 0 disk sdc 8:32 0 894.3G 0 disk sdd 8:48 0 894.3G 0 disk sde 8:64 0 1.8T 0 disk sdf 8:80 0 1.8T 0 disk nvme0n1 259:0 0 1.5T 0 disk [root@bruuni010 ubuntu]# *CEPH Details:*++ [rook@rook-ceph-tools-d6d7c985c-ksjcs /]$ ceph -v ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable) [rook@rook-ceph-tools-d6d7c985c-ksjcs /]$ [rook@rook-ceph-tools-d6d7c985c-ksjcs /]$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 6.84087 root default -2 6.84087 host bruuni010 0 1.81940 osd.0 up 1.00000 1.00000 1 1.81940 osd.1 up 1.00000 1.00000 2 1.45549 osd.2 up 1.00000 1.00000 3 0.87329 osd.3 up 1.00000 1.00000 4 0.87329 osd.4 up 1.00000 1.00000 [rook@rook-ceph-tools-d6d7c985c-ksjcs /]$
cluster.yaml parameters:++
storage:
useAllNodes: true
useAllDevices: true
#deviceFilter:
healthCheck:
Files
Updated by Radoslaw Zarzynski about 2 years ago
- Assignee set to Radoslaw Zarzynski
All nodes are having the same configurations.
I think this is a broken assumption as there are differences in the internal state between the nodes.
[root@bruuni006 examples]# kubectl -n rook-ceph logs rook-ceph-osd-prepare-bruuni008-pnw8v provision ... 2022-04-19 10:58:00.082604 I | cephosd: skipping device "sda1" because it contains a filesystem "ext4" 2022-04-19 10:58:00.082608 I | cephosd: skipping device "sdb" because it contains a filesystem "LVM2_member" 2022-04-19 10:58:00.082611 I | cephosd: skipping device "sdc" because it contains a filesystem "LVM2_member" 2022-04-19 10:58:00.082614 I | cephosd: skipping device "sdd" because it contains a filesystem "LVM2_member" 2022-04-19 10:58:00.082617 I | cephosd: skipping device "sde" because it contains a filesystem "LVM2_member" 2022-04-19 10:58:00.082621 I | cephosd: skipping device "sdf" because it contains a filesystem "LVM2_member" 2022-04-19 10:58:00.082624 I | cephosd: skipping device "nvme0n1" because it contains a filesystem "LVM2_member" 2022-04-19 10:58:00.086779 I | cephosd: configuring osd devices: {"Entries":{}}
The devices on bruni006
seem to be not wiped out while on bruni010
everything is fine:
2022-04-19 10:58:11.343001 I | cephosd: device "sdb" is available. ... 2022-04-19 10:58:31.399093 I | cephosd: device "sdd" is available. ... 2022-04-19 10:58:39.524440 I | cephosd: device "sde" is available. ... 2022-04-19 10:58:47.575665 I | cephosd: device "sdf" is available. ... 2022-04-19 10:58:55.734890 I | cephosd: device "nvme0n1" is available.
Updated by Radoslaw Zarzynski about 2 years ago
- Status changed from New to Need More Info
According to the discussion we had in the morning:
1. The logs suggest the root cause is the presence of LVM volumes on bruuni006
. This can be confirmed with the pvs
/ lvm
commands.
2, If that's the case, those volumes could be a result issues with zapping. While working on Rook / cephadm support I encountered them personally. Also, the Rook manual mentions about that:
Ceph can leave LVM and device mapper data that can lock the disks, preventing the disks from being used again.
(...)
If disks are still reported locked, rebooting the node often helps clear LVM-related holds on disks.
To exclude such issues a log (collected by e,g. the script
CLI utility) from zapping is necessary.
Updated by Radoslaw Zarzynski about 2 years ago
According the Rook-Crimson Monthly knowing the Rook version also would be useful as recent Rooks don't use LVM-by-default anymore.