Bug #55382
open
OSD's created from one node in a cluster
Added by Srinivasa Bharath Kanta about 2 years ago.
Updated about 2 years ago.
Description
OSD's are configured from only one node in the cluster. All nodes are having the same configurations.
Nodes List:++
[root@bruuni006 examples]# kubectl get node
NAME STATUS ROLES AGE VERSION
bruuni006 Ready control-plane,master 4h37m v1.23.5
bruuni007 Ready <none> 4h36m v1.23.5
bruuni008 Ready <none> 4h35m v1.23.5
bruuni010 Ready <none> 4h35m v1.23.5
[root@bruuni006 examples]#
*Node Configurations:*++
[root@bruuni006 examples]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 447.1G 0 disk
└─sda1 8:1 0 447.1G 0 part /
sdb 8:16 0 894.3G 0 disk
sdc 8:32 0 894.3G 0 disk
sdd 8:48 0 894.3G 0 disk
sde 8:64 0 1.8T 0 disk
sdf 8:80 0 1.8T 0 disk
nvme0n1 259:0 0 1.5T 0 disk
[root@bruuni006 examples]#
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 447.1G 0 disk
└─sda1 8:1 0 447.1G 0 part /
sdb 8:16 0 894.3G 0 disk
sdc 8:32 0 894.3G 0 disk
sdd 8:48 0 894.3G 0 disk
sde 8:64 0 1.8T 0 disk
sdf 8:80 0 1.8T 0 disk
nvme0n1 259:0 0 1.5T 0 disk
[root@bruuni007 ubuntu]#
[root@bruuni010 ubuntu]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 447.1G 0 disk
└─sda1 8:1 0 447.1G 0 part /
sdb 8:16 0 894.3G 0 disk
sdc 8:32 0 894.3G 0 disk
sdd 8:48 0 894.3G 0 disk
sde 8:64 0 1.8T 0 disk
sdf 8:80 0 1.8T 0 disk
nvme0n1 259:0 0 1.5T 0 disk
[root@bruuni010 ubuntu]#
*CEPH Details:*++
[rook@rook-ceph-tools-d6d7c985c-ksjcs /]$ ceph -v
ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
[rook@rook-ceph-tools-d6d7c985c-ksjcs /]$
[rook@rook-ceph-tools-d6d7c985c-ksjcs /]$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 6.84087 root default
-2 6.84087 host bruuni010
0 1.81940 osd.0 up 1.00000 1.00000
1 1.81940 osd.1 up 1.00000 1.00000
2 1.45549 osd.2 up 1.00000 1.00000
3 0.87329 osd.3 up 1.00000 1.00000
4 0.87329 osd.4 up 1.00000 1.00000
[rook@rook-ceph-tools-d6d7c985c-ksjcs /]$
cluster.yaml parameters:++
storage:
useAllNodes: true
useAllDevices: true
#deviceFilter:
healthCheck:
Files
- Assignee set to Radoslaw Zarzynski
All nodes are having the same configurations.
I think this is a broken assumption as there are differences in the internal state between the nodes.
[root@bruuni006 examples]# kubectl -n rook-ceph logs rook-ceph-osd-prepare-bruuni008-pnw8v provision
...
2022-04-19 10:58:00.082604 I | cephosd: skipping device "sda1" because it contains a filesystem "ext4"
2022-04-19 10:58:00.082608 I | cephosd: skipping device "sdb" because it contains a filesystem "LVM2_member"
2022-04-19 10:58:00.082611 I | cephosd: skipping device "sdc" because it contains a filesystem "LVM2_member"
2022-04-19 10:58:00.082614 I | cephosd: skipping device "sdd" because it contains a filesystem "LVM2_member"
2022-04-19 10:58:00.082617 I | cephosd: skipping device "sde" because it contains a filesystem "LVM2_member"
2022-04-19 10:58:00.082621 I | cephosd: skipping device "sdf" because it contains a filesystem "LVM2_member"
2022-04-19 10:58:00.082624 I | cephosd: skipping device "nvme0n1" because it contains a filesystem "LVM2_member"
2022-04-19 10:58:00.086779 I | cephosd: configuring osd devices: {"Entries":{}}
The devices on bruni006
seem to be not wiped out while on bruni010
everything is fine:
2022-04-19 10:58:11.343001 I | cephosd: device "sdb" is available.
...
2022-04-19 10:58:31.399093 I | cephosd: device "sdd" is available.
...
2022-04-19 10:58:39.524440 I | cephosd: device "sde" is available.
...
2022-04-19 10:58:47.575665 I | cephosd: device "sdf" is available.
...
2022-04-19 10:58:55.734890 I | cephosd: device "nvme0n1" is available.
- Description updated (diff)
- Status changed from New to Need More Info
According to the discussion we had in the morning:
1. The logs suggest the root cause is the presence of LVM volumes on bruuni006
. This can be confirmed with the pvs
/ lvm
commands.
2, If that's the case, those volumes could be a result issues with zapping. While working on Rook / cephadm support I encountered them personally. Also, the Rook manual mentions about that:
Ceph can leave LVM and device mapper data that can lock the disks, preventing the disks from being used again.
(...)
If disks are still reported locked, rebooting the node often helps clear LVM-related holds on disks.
To exclude such issues a log (collected by e,g. the script
CLI utility) from zapping is necessary.
According the Rook-Crimson Monthly knowing the Rook version also would be useful as recent Rooks don't use LVM-by-default anymore.
Also available in: Atom
PDF