Project

General

Profile

Actions

Bug #37587

open

udev is not declared as a dependency, fails to deploy OSDs

Added by Kevin Lang over 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
build
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

I have a cluster with some servers and hard drives available and tested ceph in different versions on different operation systems. I followed mostly the quick start guide on the ceph main page.

On "Ubuntu 16.04" with ceph-version "mimic" I had no real problems to prepare the servers with a Ceph user account, deploying a Ceph Storage Cluster and to create a Filesystem. Everything was working fine.

However, when I tried to deploy everything on "Ubuntu 18.04" I got stuck on one important part: the deployment of the OSDs. When I tried to deploy one of my hard drives following happened:

admin@server001:~/ceph-cluster$ ceph-deploy osd create --data /dev/sda server001
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/admin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy osd create --data /dev/sda server001
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  bluestore                     : None
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f512280c1b8>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  fs_type                       : xfs
[ceph_deploy.cli][INFO  ]  block_wal                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  journal                       : None
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  host                          : server001
[ceph_deploy.cli][INFO  ]  filestore                     : None
[ceph_deploy.cli][INFO  ]  func                          : <function osd at 0x7f5122c74a28>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  zap_disk                      : False
[ceph_deploy.cli][INFO  ]  data                          : /dev/sda
[ceph_deploy.cli][INFO  ]  block_db                      : None
[ceph_deploy.cli][INFO  ]  dmcrypt                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  debug                         : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sda
[server001][DEBUG ] connection detected need for sudo
[server001][DEBUG ] connected to host: server001 
[server001][DEBUG ] detect platform information from remote host
[server001][DEBUG ] detect machine type
[server001][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 18.04 bionic
[ceph_deploy.osd][DEBUG ] Deploying osd to server001
[server001][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[server001][WARNIN] osd keyring does not exist yet, creating one
[server001][DEBUG ] create a keyring file
[server001][DEBUG ] find the location of an executable
[server001][INFO  ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sda
[server001][WARNIN] No data was received after 300 seconds, disconnecting...
[server001][INFO  ] checking OSD status...
[server001][DEBUG ] find the location of an executable
[server001][INFO  ] Running command: sudo /usr/bin/ceph --cluster=ceph osd stat --format=json
[server001][WARNIN] there is 1 OSD down
[server001][WARNIN] there is 1 OSD out
[ceph_deploy.osd][DEBUG ] Host server001 is now ready for osd use.

This is quite problematic. Not only, that it seems the deployment crashed somewhere in the middle, the hard drive also got a label but 'vgdisplay' does not work properly anymore. When I used the command, and nothing happens after a while, I pressed ctrl+c to cancel it, following error message appeared:

admin@server001:~/ceph-cluster$ sudo vgdisplay
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
^C  Interrupted...
  Giving up waiting for lock.
  Can't get lock for ceph-de49ed17-c116-41f6-82b7-f1caa9c13cd8
  Cannot process volume group ceph-de49ed17-c116-41f6-82b7-f1caa9c13cd8

It seems it corrupts the hard drives and make them difficult to read.

Actions #1

Updated by Kevin Lang over 5 years ago

Okay, after some analysis the problem was found:
The device-mapper was configured for udev, which waited for the daemon that was not there. The logical volume was created by ceph-deploy, but lvcreate waited for the device-mapper to register the devices in /dev/mapper. That was the cause for the dead lock.

Actions #2

Updated by Alfredo Deza over 5 years ago

  • Status changed from New to 4

Would you mind expanding on how exactly the "device-mapper was configured for udev" ? Is this something that we can prevent or warn against? If you could add some details I think that would help us get a better experience.

Actions #3

Updated by Janek Bevendorff over 5 years ago

With udev-enabled LVM (default in most modern Linux distributions these days), you need the udev package installed. Otherwise the device mapper will not create the logical devices, because the udev rules needed for that are missing. Instead, it will just hang in a semop() call waiting for the devices to be created, which never happens. You can create them manually with either `dmsetup create` or `vgscan --mknodes`, but automatic setup via udev will simply hang forever, causing `lvcreate` and thus `ceph-deploy create osd` to fail with a timeout error.

Actions #4

Updated by Alfredo Deza over 5 years ago

Ceph already requires the udev package, how would it be possible to be in a situation where Ceph is installed without udev? Or is it possible to have udev installed but somehow not configured to work with device-mapper?

If udev is installed, is there a way to detect this as a potential configuration problem?

Actions #5

Updated by Janek Bevendorff over 5 years ago

In our setup, udev was not installed. It may have been pulled in on the admin node, but the target nodes we wanted to deploy our OSDs on were clearly missing the udev package. I would assume udev comes pre-installed on most "full" systems, but if you use minimalist PXE boot images, it may not be there. Installing udev and activating the service via systemctl start systemd-udevd (or rebooting) fixed the problem. IMHO, device mapper should pull in udev as a hard dependency if dead locks like this can happen, but it seems that you can have a udev-less system with udev-enabled device mapper---at least on Ubuntu 18.04.

Actions #6

Updated by Alfredo Deza over 5 years ago

  • Project changed from 18 to Ceph
  • Subject changed from In Ubuntu 18.04 the Ceph Deploy command for installing OSDs does not work to udev is not declared as a dependency, fails to deploy OSDs
  • Category set to build
  • Status changed from 4 to 12

Just confirmed that Ceph is not pulling in udev. This is a packaging bug for sure, Ceph has to pull in udev, as it relies on it. Moving this issue to the Ceph project.

I'm sure this is not only a Ubuntu thing, but an RPM as well, since `udev` is listed as a `BuildRequires`, but the package requires it to function properly

Actions #7

Updated by Alfredo Deza over 5 years ago

Using brand new 18.04, remove udev:

root@node1:/home/vagrant# dpkg -l udev
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                                  Version                         Architecture                    Description
+++-=====================================================-===============================-===============================-===============================================================================================================
ii  udev                                                  237-3ubuntu10.9                 amd64                           /dev/ and hotplug management daemon
root@node1:/home/vagrant# apt-get remove --purge udev
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
  initramfs-tools* initramfs-tools-core* plymouth* plymouth-theme-ubuntu-text* ubuntu-minimal* udev*
0 upgraded, 0 newly installed, 6 to remove and 0 not upgraded.
After this operation, 8,879 kB disk space will be freed.
Do you want to continue? [Y/n] Y
(Reading database ... 63199 files and directories currently installed.)
Removing ubuntu-minimal (1.417) ...
Removing initramfs-tools (0.130ubuntu3.5) ...
Removing initramfs-tools-core (0.130ubuntu3.5) ...
Removing plymouth-theme-ubuntu-text (0.9.3-1ubuntu7.18.04.1) ...
Removing plymouth (0.9.3-1ubuntu7.18.04.1) ...
Removing udev (237-3ubuntu10.9) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
(Reading database ... 62952 files and directories currently installed.)
Purging configuration files for initramfs-tools-core (0.130ubuntu3.5) ...
dpkg: warning: while removing initramfs-tools-core, directory '/var/lib/initramfs-tools' not empty so not removed
Purging configuration files for plymouth-theme-ubuntu-text (0.9.3-1ubuntu7.18.04.1) ...
Purging configuration files for initramfs-tools (0.130ubuntu3.5) ...
Purging configuration files for plymouth (0.9.3-1ubuntu7.18.04.1) ...
Purging configuration files for udev (237-3ubuntu10.9) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (237-3ubuntu10.9) ...

Install latest mimic release:

ceph-deploy install --release=mimic node1
[ceph_deploy.conf][DEBUG ] found configuration file at: /Users/alfredo/tmp/foo/cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.2): /Users/alfredo/.virtualenvs/ceph-deploy3/bin/ceph-deploy install --release=mimic node1
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  stable                        : None
[ceph_deploy.cli][INFO  ]  release                       : mimic
[ceph_deploy.cli][INFO  ]  testing                       : None
[ceph_deploy.cli][INFO  ]  dev                           : master
[ceph_deploy.cli][INFO  ]  dev_commit                    : None
[ceph_deploy.cli][INFO  ]  install_mon                   : False
[ceph_deploy.cli][INFO  ]  install_mgr                   : False
[ceph_deploy.cli][INFO  ]  install_mds                   : False
[ceph_deploy.cli][INFO  ]  install_rgw                   : False
[ceph_deploy.cli][INFO  ]  install_osd                   : False
[ceph_deploy.cli][INFO  ]  install_tests                 : False
[ceph_deploy.cli][INFO  ]  install_common                : False
[ceph_deploy.cli][INFO  ]  install_all                   : False
[ceph_deploy.cli][INFO  ]  adjust_repos                  : True
[ceph_deploy.cli][INFO  ]  repo                          : False
[ceph_deploy.cli][INFO  ]  host                          : ['node1']
[ceph_deploy.cli][INFO  ]  local_mirror                  : None
[ceph_deploy.cli][INFO  ]  repo_url                      : None
[ceph_deploy.cli][INFO  ]  gpg_url                       : None
[ceph_deploy.cli][INFO  ]  nogpgcheck                    : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf object at 0x1053975f8>
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  version_kind                  : stable
[ceph_deploy.cli][INFO  ]  func                          : <function install at 0x1052f6840>
[ceph_deploy.install][DEBUG ] Installing stable version mimic on cluster ceph hosts node1
[ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
[node1][DEBUG ] connection detected need for sudo
[node1][DEBUG ] connected to host: node1
[ceph_deploy.install][INFO  ] Distro info: Ubuntu 18.04 bionic
[node1][INFO  ] installing Ceph on node1
[node1][INFO  ] Running command: sudo env DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get --assume-yes -q update
...
[node1][DEBUG ] Setting up ceph (13.2.2-1bionic) ...
[node1][DEBUG ] Processing triggers for libc-bin (2.27-3ubuntu1) ...
[node1][DEBUG ] Processing triggers for ureadahead (0.100.0-20) ...
[node1][DEBUG ] Processing triggers for systemd (237-3ubuntu10.9) ...
[node1][INFO  ] Running command: sudo ceph --version
[node1][DEBUG ] ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)

udev is not installed:

root@node1:/home/vagrant# dpkg -l udev
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                                  Version                         Architecture                    Description
+++-=====================================================-===============================-===============================-===============================================================================================================
un  udev                                                  <none>                          <none>                          (no description available)
root@node1:/home/vagrant# systemctl status systemd-udevd
Unit systemd-udevd.service could not be found.
root@node1:/home/vagrant# which ceph
/usr/bin/ceph
root@node1:/home/vagrant# ceph --version
ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
root@node1:/home/vagrant# apt install udev
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  udev
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,101 kB of archives.
After this operation, 7,904 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 udev amd64 237-3ubuntu10.9 [1,101 kB]
Fetched 1,101 kB in 1s (1,192 kB/s)
Selecting previously unselected package udev.
(Reading database ... 66860 files and directories currently installed.)
Preparing to unpack .../udev_237-3ubuntu10.9_amd64.deb ...
Unpacking udev (237-3ubuntu10.9) ...
Processing triggers for ureadahead (0.100.0-20) ...
Setting up udev (237-3ubuntu10.9) ...
Processing triggers for systemd (237-3ubuntu10.9) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for ureadahead (0.100.0-20) ...
root@node1:/home/vagrant# systemctl status systemd-udevd
● systemd-udevd.service - udev Kernel Device Manager
   Loaded: loaded (/lib/systemd/system/systemd-udevd.service; static; vendor preset: enabled)
   Active: active (running) since Mon 2018-12-17 14:43:04 UTC; 4s ago
     Docs: man:systemd-udevd.service(8)
           man:udev(7)
 Main PID: 4949 (systemd-udevd)
   Status: "Processing with 12 children at max" 
    Tasks: 1
   CGroup: /system.slice/systemd-udevd.service
           └─4949 /lib/systemd/systemd-udevd

Dec 17 14:43:04 node1 systemd[1]: Starting udev Kernel Device Manager...
Dec 17 14:43:04 node1 systemd-udevd[4949]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Dec 17 14:43:04 node1 systemd[1]: Started udev Kernel Device Manager.
Actions #8

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF