Bug #24762: "ceph-volume scan" does not detect cluster name different than 'ceph' - ceph-volume - Ceph

Actions

Copy link

Bug #24762

closed

"ceph-volume scan" does not detect cluster name different than 'ceph'

Added by Sébastien Han almost 6 years ago. Updated over 5 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Running "ceph-volume scan" on a data partition of an OSD configured with ceph-disk where the cluster name is 'test' reports 'ceph' as a cluster name.
Note that both approaches pointing to the mountpoint AND the data partition provide the same error:

With the mountpoint:

[root@ceph-osd0 ~]# ceph-volume simple scan /var/lib/ceph/osd/test-1/ --force
 stderr: lsblk: /var/lib/ceph/osd/test-1: not a block device
 stderr: lsblk: /var/lib/ceph/osd/test-1: not a block device
Running command: /usr/sbin/cryptsetup status /dev/sda1
 stderr: Device sda1 not found
--> OSD 1 got scanned and metadata persisted to file: /etc/ceph/osd/1-a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3.json
--> To take over managment of this scanned OSD, and disable ceph-disk and udev, run:
-->     ceph-volume simple activate 1 a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3
[root@ceph-osd0 ~]# cat /etc/ceph/osd/1-a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3.json
{
    "active": "ok",
    "ceph_fsid": "b4c56e01-ab71-4b24-93d0-adafd139847f",
    "cluster_name": "ceph",
    "data": {
        "path": "/dev/sda1",
        "uuid": "a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3" 
    },
    "fsid": "a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3",
    "journal": {
        "path": "/dev/disk/by-partuuid/e9764491-22b6-4c16-8304-c9320c887d03",
        "uuid": "e9764491-22b6-4c16-8304-c9320c887d03" 
    },
    "journal_uuid": "e9764491-22b6-4c16-8304-c9320c887d03",
    "keyring": "AQCksTNbKOjxLxAA/EKIN0Zc7Qt5Iux8TeNHlA==",
    "magic": "ceph osd volume v026",
    "ready": "ready",
    "systemd": "",
    "type": "filestore",
    "whoami": 1

With the device:

[root@ceph-osd0 ~]# ceph-volume simple scan /dev/sda1
Running command: /usr/sbin/cryptsetup status /dev/sda1
 stderr: Device sda1 not found
--> OSD 1 got scanned and metadata persisted to file: /etc/ceph/osd/1-a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3.json
--> To take over managment of this scanned OSD, and disable ceph-disk and udev, run:
-->     ceph-volume simple activate 1 a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3
[root@ceph-osd0 ~]# cat /etc/ceph/osd/1-a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3.json
{
    "active": "ok",
    "ceph_fsid": "b4c56e01-ab71-4b24-93d0-adafd139847f",
    "cluster_name": "ceph",
    "data": {
        "path": "/dev/sda1",
        "uuid": "a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3" 
    },
    "fsid": "a8cfc3cc-1294-4eaa-9e2c-1c3ec93062a3",
    "journal": {
        "path": "/dev/disk/by-partuuid/e9764491-22b6-4c16-8304-c9320c887d03",
        "uuid": "e9764491-22b6-4c16-8304-c9320c887d03" 
    },
    "journal_uuid": "e9764491-22b6-4c16-8304-c9320c887d03",
    "keyring": "AQCksTNbKOjxLxAA/EKIN0Zc7Qt5Iux8TeNHlA==",
    "magic": "ceph osd volume v026",
    "ready": "ready",
    "systemd": "",
    "type": "filestore",
    "whoami": 1

Actions

Copy link

Updated by Sébastien Han almost 6 years ago

Project changed from Ceph to ceph-volume

Actions

Copy link

Updated by Alfredo Deza almost 6 years ago

Status changed from New to 4

The way 'scan' works is by inspecting the data device and the contents of the OSD. I don't think there is anything in the OSD that tells you the name of the cluster, so
'scan' will default to ceph.

However, it is up to the user if they want to have a different cluster name supported when scanning:

ceph-volume --cluster=test simple scan /dev/sda1

In the ceph-ansible tests which use the 'test' name for clusters, this is how we ensure this is supported.

An alternative way is for a user to manually add the following in the JSON:

{
...
    "cluster_name": "test" 
...
}

If you confirm that this works using `--cluster` then I am going to close it since that is the expected behavior

Actions

Copy link

Updated by Sébastien Han almost 6 years ago

My expectation when it comes to the "scan" subcommand is to look for everything that is relevant. I don't particularly need to pass '--cluster' when this can be discovered. On best effort approach would be to lookup any .conf file in /etc/ceph that matches the fsid present in the osd data directory when a device is given. Alternatively, when passing a directory /var/lib/ceph/osd/toto-0, it seems pretty obvious than the cluster name is 'toto'.

What do you think?

Actions

Copy link

Updated by Alfredo Deza almost 6 years ago

We initially thought about poking into `/etc/ceph/` but the problem is that we can't be sure if say, there are two *.conf files in there, or incorrect in the case of re-installing the OS or if the drives are moved to a new server and the cluster name is different.

The idea was to report accurately on the OSD only, and let the external configuration be corrected/matched to make it run.

On the path side of things, you are right, we could infer by the name of the mounted directory, and we might be able to infer the cluster name, but we allow to use a device that is unmounted (in which case we mount it temporarily) so that detection wouldn't work there, or if it is mounted in a non-standard location.

I would say that most (all?) of the ceph tooling requires you to pass --cluster and does not do any sort of heuristics to determine what that may be (they all fallback to 'ceph')

My concern is that there is too much room for error on our side to get it wrong, it depends on configuration outside of the OSD, and the 'best effort' can cause problems if users are relying
on it and ceph-volume can't deliver. We knew we might get things missing, which is why the JSON can be edited, but then again, in this case we allow --cluster.

Actions

Copy link

Updated by Alfredo Deza over 5 years ago

Status changed from 4 to Rejected

I am going to close this since we can't reliably do a "best effort". As explained, we can't guarantee what the cluster name is. When a user signs on the idea of having a custom cluster it is absolutely required to pass --cluster everywhere in the Ceph tooling, otherwise things will not work.

No other tool in Ceph does a "best effort" to try and figure out what that cluster name is, either it is passed in or defaults to ceph.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » ceph-volume

Custom queries

Bug #24762

"ceph-volume scan" does not detect cluster name different than 'ceph'

Updated by Sébastien Han almost 6 years ago

Updated by Alfredo Deza almost 6 years ago

Updated by Sébastien Han almost 6 years ago

Updated by Alfredo Deza almost 6 years ago

Updated by Alfredo Deza over 5 years ago