Bug #24762
closed"ceph-volume scan" does not detect cluster name different than 'ceph'
0%
Description
Running "ceph-volume scan" on a data partition of an OSD configured with ceph-disk where the cluster name is 'test' reports 'ceph' as a cluster name.
Note that both approaches pointing to the mountpoint AND the data partition provide the same error:
With the mountpoint:
[root@ceph-osd0 ~]# ceph-volume simple scan /var/lib/ceph/osd/test-1/ --force stderr: lsblk: /var/lib/ceph/osd/test-1: not a block device stderr: lsblk: /var/lib/ceph/osd/test-1: not a block device Running command: /usr/sbin/cryptsetup status /dev/sda1 stderr: Device sda1 not found --> OSD 1 got scanned and metadata persisted to file: /etc/ceph/osd/1-a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3.json --> To take over managment of this scanned OSD, and disable ceph-disk and udev, run: --> ceph-volume simple activate 1 a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3 [root@ceph-osd0 ~]# cat /etc/ceph/osd/1-a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3.json { "active": "ok", "ceph_fsid": "b4c56e01-ab71-4b24-93d0-adafd139847f", "cluster_name": "ceph", "data": { "path": "/dev/sda1", "uuid": "a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3" }, "fsid": "a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3", "journal": { "path": "/dev/disk/by-partuuid/e9764491-22b6-4c16-8304-c9320c887d03", "uuid": "e9764491-22b6-4c16-8304-c9320c887d03" }, "journal_uuid": "e9764491-22b6-4c16-8304-c9320c887d03", "keyring": "AQCksTNbKOjxLxAA/EKIN0Zc7Qt5Iux8TeNHlA==", "magic": "ceph osd volume v026", "ready": "ready", "systemd": "", "type": "filestore", "whoami": 1
With the device:
[root@ceph-osd0 ~]# ceph-volume simple scan /dev/sda1 Running command: /usr/sbin/cryptsetup status /dev/sda1 stderr: Device sda1 not found --> OSD 1 got scanned and metadata persisted to file: /etc/ceph/osd/1-a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3.json --> To take over managment of this scanned OSD, and disable ceph-disk and udev, run: --> ceph-volume simple activate 1 a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3 [root@ceph-osd0 ~]# cat /etc/ceph/osd/1-a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3.json { "active": "ok", "ceph_fsid": "b4c56e01-ab71-4b24-93d0-adafd139847f", "cluster_name": "ceph", "data": { "path": "/dev/sda1", "uuid": "a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3" }, "fsid": "a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3", "journal": { "path": "/dev/disk/by-partuuid/e9764491-22b6-4c16-8304-c9320c887d03", "uuid": "e9764491-22b6-4c16-8304-c9320c887d03" }, "journal_uuid": "e9764491-22b6-4c16-8304-c9320c887d03", "keyring": "AQCksTNbKOjxLxAA/EKIN0Zc7Qt5Iux8TeNHlA==", "magic": "ceph osd volume v026", "ready": "ready", "systemd": "", "type": "filestore", "whoami": 1
Updated by Sébastien Han almost 6 years ago
- Project changed from Ceph to ceph-volume
Updated by Alfredo Deza almost 6 years ago
- Status changed from New to 4
The way 'scan' works is by inspecting the data device and the contents of the OSD. I don't think there is anything in the OSD that tells you the name of the cluster, so
'scan' will default to ceph.
However, it is up to the user if they want to have a different cluster name supported when scanning:
ceph-volume --cluster=test simple scan /dev/sda1
In the ceph-ansible tests which use the 'test' name for clusters, this is how we ensure this is supported.
An alternative way is for a user to manually add the following in the JSON:
{ ... "cluster_name": "test" ... }
If you confirm that this works using `--cluster` then I am going to close it since that is the expected behavior
Updated by Sébastien Han almost 6 years ago
My expectation when it comes to the "scan" subcommand is to look for everything that is relevant. I don't particularly need to pass '--cluster' when this can be discovered. On best effort approach would be to lookup any .conf file in /etc/ceph that matches the fsid present in the osd data directory when a device is given. Alternatively, when passing a directory /var/lib/ceph/osd/toto-0, it seems pretty obvious than the cluster name is 'toto'.
What do you think?
Updated by Alfredo Deza almost 6 years ago
We initially thought about poking into `/etc/ceph/` but the problem is that we can't be sure if say, there are two *.conf files in there, or incorrect in the case of re-installing the OS or if the drives are moved to a new server and the cluster name is different.
The idea was to report accurately on the OSD only, and let the external configuration be corrected/matched to make it run.
On the path side of things, you are right, we could infer by the name of the mounted directory, and we might be able to infer the cluster name, but we allow to use a device that is unmounted (in which case we mount it temporarily) so that detection wouldn't work there, or if it is mounted in a non-standard location.
I would say that most (all?) of the ceph tooling requires you to pass --cluster and does not do any sort of heuristics to determine what that may be (they all fallback to 'ceph')
My concern is that there is too much room for error on our side to get it wrong, it depends on configuration outside of the OSD, and the 'best effort' can cause problems if users are relying
on it and ceph-volume can't deliver. We knew we might get things missing, which is why the JSON can be edited, but then again, in this case we allow --cluster.
Updated by Alfredo Deza over 5 years ago
- Status changed from 4 to Rejected
I am going to close this since we can't reliably do a "best effort". As explained, we can't guarantee what the cluster name is. When a user signs on the idea of having a custom cluster it is absolutely required to pass --cluster everywhere in the Ceph tooling, otherwise things will not work.
No other tool in Ceph does a "best effort" to try and figure out what that cluster name is, either it is passed in or defaults to ceph.