Feature #24099
openosd: Improve workflow when creating OSD on raw block device if there was bluestore data on it before
0%
Description
On Ceph Luminous, when creating a new bluestore OSD on a block device
ceph-osd -i 0 --mkfs --osd-uuid aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa --osd-objectstore bluestore
after preparing the data directory like this:
# ls -lah /var/lib/ceph/osd/ceph-0 total 12K drwxr-xr-x 2 ceph nogroup 4.0K May 11 00:36 . drwxr-xr-x 3 ceph ceph 4.0K May 11 00:36 .. lrwxrwxrwx 1 ceph nogroup 10 May 11 00:36 block -> /dev/md125 lrwxrwxrwx 1 ceph nogroup 10 May 11 00:36 block.db -> /dev/md127 -rw------- 1 ceph nogroup 56 May 11 00:38 keyring
and if there's already some `bluestore block device ...` data on `/dev/md125` from some past use of that device in Ceph (check e.g. with `less -f /dev/md125`), then `ceph-osd` will fail with:
2018-05-11 14:32:10.369838 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_fsid (2) No such file or directory 2018-05-11 14:32:10.369844 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs fsck found fatal error: (2) No such file or directory 2018-05-11 14:32:10.369845 7f9a198aae80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-05-11 14:32:10.369879 7f9a198aae80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory
The issue here is that ceph sees the metadata, and increased logging will show this:
2018-05-11 14:32:10.369809 7f9a198aae80 10 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label got bdev(osd_uuid aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa, size 0x9187 7d00000, btime 2018-05-09 00:14:56.959765, desc main, 7 meta) 2018-05-11 14:32:10.369830 7f9a198aae80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs already created 2018-05-11 14:32:10.369830 7f9a198aae80 1 bluestore(/var/lib/ceph/osd/ceph-0) _fsck repair (shallow) start 2018-05-11 14:32:10.369838 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_fsid (2) No such file or directory 2018-05-11 14:32:10.369844 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs fsck found fatal error: (2) No such file or directory 2018-05-11 14:32:10.369845 7f9a198aae80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-05-11 14:32:10.369879 7f9a198aae80 -1 ESC[0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directoryESC[0m
Because ceph sees this metadata, it thinks that the device is already an OSD, and will start `fsck` (`BlueStore::mkfs()` will `read_meta("mkfs_done", &done)` and call `fsck()`).
But the fsck() will fail because the contents of the data directory are incomplete (as shown above).
A work-around for this is `ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block`.
But the error message for this right now isn't obvious.
It would be great if Ceph could:
- Point out that it found existing data on the OSD, and possibly suggest using `ceph-volume lvm zap` if that's what the user desires (this isn't quite obvious already because that command has "lvm" in the name and no LVM is in use in this case)
- Figure out that if all of the files in the data directory are missing, it's probably not sensible to start an fsck
- Provide a flag to ceph-osd to force creating a new bluestore, no matter what's already on the disk, to make it easier to script custom ceph deployments
Thanks!