Project

General

Profile

Actions

Feature #24099

open

osd: Improve workflow when creating OSD on raw block device if there was bluestore data on it before

Added by Niklas Hambuechen almost 6 years ago. Updated 11 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
OSD
Pull request ID:

Description

On Ceph Luminous, when creating a new bluestore OSD on a block device

ceph-osd -i 0 --mkfs --osd-uuid aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa --osd-objectstore bluestore

after preparing the data directory like this:

# ls -lah /var/lib/ceph/osd/ceph-0
total 12K
drwxr-xr-x 2 ceph nogroup 4.0K May 11 00:36 .
drwxr-xr-x 3 ceph ceph    4.0K May 11 00:36 ..
lrwxrwxrwx 1 ceph nogroup   10 May 11 00:36 block -> /dev/md125
lrwxrwxrwx 1 ceph nogroup   10 May 11 00:36 block.db -> /dev/md127
-rw------- 1 ceph nogroup   56 May 11 00:38 keyring

and if there's already some `bluestore block device ...` data on `/dev/md125` from some past use of that device in Ceph (check e.g. with `less -f /dev/md125`), then `ceph-osd` will fail with:

2018-05-11 14:32:10.369838 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_fsid (2) No such file or directory
2018-05-11 14:32:10.369844 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs fsck found fatal error: (2) No such file or directory
2018-05-11 14:32:10.369845 7f9a198aae80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2018-05-11 14:32:10.369879 7f9a198aae80 -1  ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory

The issue here is that ceph sees the metadata, and increased logging will show this:

2018-05-11 14:32:10.369809 7f9a198aae80 10 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label got bdev(osd_uuid aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa, size 0x9187
7d00000, btime 2018-05-09 00:14:56.959765, desc main, 7 meta)
2018-05-11 14:32:10.369830 7f9a198aae80  1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs already created
2018-05-11 14:32:10.369830 7f9a198aae80  1 bluestore(/var/lib/ceph/osd/ceph-0) _fsck repair (shallow) start
2018-05-11 14:32:10.369838 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_fsid (2) No such file or directory
2018-05-11 14:32:10.369844 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs fsck found fatal error: (2) No such file or directory
2018-05-11 14:32:10.369845 7f9a198aae80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2018-05-11 14:32:10.369879 7f9a198aae80 -1 ESC[0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directoryESC[0m

Because ceph sees this metadata, it thinks that the device is already an OSD, and will start `fsck` (`BlueStore::mkfs()` will `read_meta("mkfs_done", &done)` and call `fsck()`).

But the fsck() will fail because the contents of the data directory are incomplete (as shown above).

A work-around for this is `ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block`.

But the error message for this right now isn't obvious.

It would be great if Ceph could:

  • Point out that it found existing data on the OSD, and possibly suggest using `ceph-volume lvm zap` if that's what the user desires (this isn't quite obvious already because that command has "lvm" in the name and no LVM is in use in this case)
  • Figure out that if all of the files in the data directory are missing, it's probably not sensible to start an fsck
  • Provide a flag to ceph-osd to force creating a new bluestore, no matter what's already on the disk, to make it easier to script custom ceph deployments

Thanks!

Actions

Also available in: Atom PDF