Project

General

Profile

Actions

Bug #37587

open

udev is not declared as a dependency, fails to deploy OSDs

Added by Kevin Lang over 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
build
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

I have a cluster with some servers and hard drives available and tested ceph in different versions on different operation systems. I followed mostly the quick start guide on the ceph main page.

On "Ubuntu 16.04" with ceph-version "mimic" I had no real problems to prepare the servers with a Ceph user account, deploying a Ceph Storage Cluster and to create a Filesystem. Everything was working fine.

However, when I tried to deploy everything on "Ubuntu 18.04" I got stuck on one important part: the deployment of the OSDs. When I tried to deploy one of my hard drives following happened:

admin@server001:~/ceph-cluster$ ceph-deploy osd create --data /dev/sda server001
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/admin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy osd create --data /dev/sda server001
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  bluestore                     : None
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f512280c1b8>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  fs_type                       : xfs
[ceph_deploy.cli][INFO  ]  block_wal                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  journal                       : None
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  host                          : server001
[ceph_deploy.cli][INFO  ]  filestore                     : None
[ceph_deploy.cli][INFO  ]  func                          : <function osd at 0x7f5122c74a28>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  zap_disk                      : False
[ceph_deploy.cli][INFO  ]  data                          : /dev/sda
[ceph_deploy.cli][INFO  ]  block_db                      : None
[ceph_deploy.cli][INFO  ]  dmcrypt                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  debug                         : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sda
[server001][DEBUG ] connection detected need for sudo
[server001][DEBUG ] connected to host: server001 
[server001][DEBUG ] detect platform information from remote host
[server001][DEBUG ] detect machine type
[server001][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 18.04 bionic
[ceph_deploy.osd][DEBUG ] Deploying osd to server001
[server001][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[server001][WARNIN] osd keyring does not exist yet, creating one
[server001][DEBUG ] create a keyring file
[server001][DEBUG ] find the location of an executable
[server001][INFO  ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sda
[server001][WARNIN] No data was received after 300 seconds, disconnecting...
[server001][INFO  ] checking OSD status...
[server001][DEBUG ] find the location of an executable
[server001][INFO  ] Running command: sudo /usr/bin/ceph --cluster=ceph osd stat --format=json
[server001][WARNIN] there is 1 OSD down
[server001][WARNIN] there is 1 OSD out
[ceph_deploy.osd][DEBUG ] Host server001 is now ready for osd use.

This is quite problematic. Not only, that it seems the deployment crashed somewhere in the middle, the hard drive also got a label but 'vgdisplay' does not work properly anymore. When I used the command, and nothing happens after a while, I pressed ctrl+c to cancel it, following error message appeared:

admin@server001:~/ceph-cluster$ sudo vgdisplay
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
^C  Interrupted...
  Giving up waiting for lock.
  Can't get lock for ceph-de49ed17-c116-41f6-82b7-f1caa9c13cd8
  Cannot process volume group ceph-de49ed17-c116-41f6-82b7-f1caa9c13cd8

It seems it corrupts the hard drives and make them difficult to read.

Actions

Also available in: Atom PDF