Project

General

Profile

Feature #24099

osd: Improve workflow when creating OSD on raw block device if there was bluestore data on it before

Added by Niklas Hambuechen about 3 years ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
OSD
Pull request ID:

Description

On Ceph Luminous, when creating a new bluestore OSD on a block device

ceph-osd -i 0 --mkfs --osd-uuid aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa --osd-objectstore bluestore

after preparing the data directory like this:

# ls -lah /var/lib/ceph/osd/ceph-0
total 12K
drwxr-xr-x 2 ceph nogroup 4.0K May 11 00:36 .
drwxr-xr-x 3 ceph ceph    4.0K May 11 00:36 ..
lrwxrwxrwx 1 ceph nogroup   10 May 11 00:36 block -> /dev/md125
lrwxrwxrwx 1 ceph nogroup   10 May 11 00:36 block.db -> /dev/md127
-rw------- 1 ceph nogroup   56 May 11 00:38 keyring

and if there's already some `bluestore block device ...` data on `/dev/md125` from some past use of that device in Ceph (check e.g. with `less -f /dev/md125`), then `ceph-osd` will fail with:

2018-05-11 14:32:10.369838 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_fsid (2) No such file or directory
2018-05-11 14:32:10.369844 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs fsck found fatal error: (2) No such file or directory
2018-05-11 14:32:10.369845 7f9a198aae80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2018-05-11 14:32:10.369879 7f9a198aae80 -1  ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory

The issue here is that ceph sees the metadata, and increased logging will show this:

2018-05-11 14:32:10.369809 7f9a198aae80 10 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label got bdev(osd_uuid aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa, size 0x9187
7d00000, btime 2018-05-09 00:14:56.959765, desc main, 7 meta)
2018-05-11 14:32:10.369830 7f9a198aae80  1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs already created
2018-05-11 14:32:10.369830 7f9a198aae80  1 bluestore(/var/lib/ceph/osd/ceph-0) _fsck repair (shallow) start
2018-05-11 14:32:10.369838 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_fsid (2) No such file or directory
2018-05-11 14:32:10.369844 7f9a198aae80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs fsck found fatal error: (2) No such file or directory
2018-05-11 14:32:10.369845 7f9a198aae80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2018-05-11 14:32:10.369879 7f9a198aae80 -1 ESC[0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directoryESC[0m

Because ceph sees this metadata, it thinks that the device is already an OSD, and will start `fsck` (`BlueStore::mkfs()` will `read_meta("mkfs_done", &done)` and call `fsck()`).

But the fsck() will fail because the contents of the data directory are incomplete (as shown above).

A work-around for this is `ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block`.

But the error message for this right now isn't obvious.

It would be great if Ceph could:

  • Point out that it found existing data on the OSD, and possibly suggest using `ceph-volume lvm zap` if that's what the user desires (this isn't quite obvious already because that command has "lvm" in the name and no LVM is in use in this case)
  • Figure out that if all of the files in the data directory are missing, it's probably not sensible to start an fsck
  • Provide a flag to ceph-osd to force creating a new bluestore, no matter what's already on the disk, to make it easier to script custom ceph deployments

Thanks!

History

#1 Updated by Niklas Hambuechen about 3 years ago

Another related issue I found is that zapping requires root, even when the user executing it already has write permissions on the block device and thus could do the zap without problems:

ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block
-->  SuperUserError: This command needs to be executed with sudo or as root

#2 Updated by Greg Farnum about 3 years ago

  • Project changed from Ceph to ceph-volume
  • Category deleted (ceph cli)

#3 Updated by Alfredo Deza about 3 years ago

  • Project changed from ceph-volume to Ceph

This is not a ceph-volume issue, the description of this issue doesn't point to a ceph-volume operation, but rather, a manual deployment of an OSD.

"zapping" or using anything with `ceph-volume lvm` might work only if logical volumes are used. Super user permissions are needed because LVM commands requires it.

Moving this back to the Ceph backlog.

#4 Updated by John Spray about 3 years ago

Point out that it found existing data on the OSD, and possibly suggest using `ceph-volume lvm zap` if that's what the user desires (this isn't quite obvious already because that command has "lvm" in the name and no LVM is in use in this case)

This seems like an odd idea -- if someone is doing OSD creation by hand, why would they want to be pointed to a ceph-volume command for zapping? Surely they're in a position to zap the OSD by hand too?

Provide a flag to ceph-osd to force creating a new bluestore, no matter what's already on the disk, to make it easier to script custom ceph deployments

I'm not sure what the motivation is here: if you're using ceph-volume you already have a zap operation. If you're not using ceph-volume, you can easily wipe a disk by hand without needing ceph-osd to do it for you. I'm left unconvinced that we need overwrite/zap functionality in ceph-osd -- what am I missing?

#5 Updated by Patrick Donnelly about 2 years ago

  • Project changed from Ceph to RADOS
  • Subject changed from ceph-osd: Improve workflow when creating OSD on raw block device if there was bluestore data on it before to osd: Improve workflow when creating OSD on raw block device if there was bluestore data on it before
  • Start date deleted (05/11/2018)
  • Affected Versions deleted (v12.2.5)
  • Component(RADOS) OSD added

#6 Updated by Марк Коренберг over 1 year ago

Triggered the same! WTF ?

#7 Updated by Niklas Hambuechen about 1 year ago

John Spray wrote:

This seems like an odd idea -- if someone is doing OSD creation by hand, why would they want to be pointed to a ceph-volume command for zapping? Surely they're in a position to zap the OSD by hand too?

Yes, with "and possibly suggest using `ceph-volume lvm zap` if that's what the user desires" I was not suggesting that the user needs to use that tool; I meant it only as a hint that the user needs to do the equivalent of what that tool does (and may optionally use that tool for it, if that's convenient to them).

I would certainly find that a more useful message than the current No such file or directory.

Also available in: Atom PDF