Bug #12726
ceph-deploy suite fails on CentOS 7
0%
Description
http://pulpito.ceph.com/loic-2015-08-18_23:30:39-ceph-deploy-master---basic-vps/1020940/
2015-08-18T14:41:11.490 INFO:tasks.ceph_deploy:Ceph health: HEALTH_ERR 64 pgs stuck inactive; 64 pgs stuck unclean; no osds 2015-08-18T14:41:21.490 INFO:teuthology.orchestra.run.vpm084:Running: 'cd /home/ubuntu/cephtest && sudo ceph health'
It looks like ceph-deploy gives up after preparing /dev/vdb does not try to prepare / activate /dev/vdc
Related issues
History
#1 Updated by Loïc Dachary over 8 years ago
teuthology-suite -l1 --verbose --suite ceph-deploy --suite-branch master --email loic@dachary.org --filter=centos_7 --ceph next --machine-type vps
#2 Updated by Loïc Dachary over 8 years ago
- Priority changed from Normal to Urgent
It looks like CentOS 7 was added late july 2015 http://pulpito.ceph.com/teuthology-2015-07-19_02:10:02-ceph-deploy-next-distro-basic-vps/ and was never successfully run.
#3 Updated by Loïc Dachary over 8 years ago
Fail without dmcrypt http://pulpito.ceph.com/loic-2015-08-20_01:57:10-ceph-deploy-master---basic-vps/1022951/
teuthology-suite --priority 50 --filter='ceph-deploy/basic/{ceph-deploy-overrides/disable_diff_journal_disk.yaml config_options/cephdeploy_conf.yaml distros/centos_7.0.yaml tasks/ceph-deploy_hello_world.yaml}' --verbose --suite ceph-deploy --suite-branch master --email loic@dachary.org --ceph master --machine-type vps
The OSD on one machine was properly mounted.
#4 Updated by Loïc Dachary over 8 years ago
Caught this before a VM was shutdown.
Aug 19 17:10:30 vpm045 kernel: vdb: Aug 19 17:10:31 vpm045 kernel: vdb: vdb2 Aug 19 17:10:31 vpm045 systemd: Starting system-ceph\x2ddisk\x2dactivate\x2djournal.slice. Aug 19 17:10:31 vpm045 systemd: Created slice system-ceph\x2ddisk\x2dactivate\x2djournal.slice. Aug 19 17:10:31 vpm045 systemd: Starting Ceph disk journal activation: /dev/vdb2... Aug 19 17:10:31 vpm045 ceph-disk: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device Aug 19 17:10:31 vpm045 ceph-disk: error: /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: No such file or directory Aug 19 17:10:31 vpm045 ceph-disk: ceph-disk: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 Aug 19 17:10:31 vpm045 systemd: ceph-disk-activate-journal@-dev-vdb2.service: main process exited, code=exited, status=1/FAILURE Aug 19 17:10:31 vpm045 systemd: Failed to start Ceph disk journal activation: /dev/vdb2. Aug 19 17:10:31 vpm045 systemd: Unit ceph-disk-activate-journal@-dev-vdb2.service entered failed state. Aug 19 17:10:32 vpm045 kernel: vdb: vdb1 vdb2 Aug 19 17:10:32 vpm045 systemd: Starting Ceph disk journal activation: /dev/vdb2... Aug 19 17:10:32 vpm045 ceph-disk: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device Aug 19 17:10:32 vpm045 ceph-disk: error: /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: No such file or directory Aug 19 17:10:32 vpm045 ceph-disk: ceph-disk: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 Aug 19 17:10:32 vpm045 systemd: ceph-disk-activate-journal@-dev-vdb2.service: main process exited, code=exited, status=1/FAILURE Aug 19 17:10:32 vpm045 systemd: Failed to start Ceph disk journal activation: /dev/vdb2. Aug 19 17:10:32 vpm045 systemd: Unit ceph-disk-activate-journal@-dev-vdb2.service entered failed state. Aug 19 17:10:34 vpm045 kernel: SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled Aug 19 17:10:34 vpm045 kernel: XFS (vdb1): Mounting Filesystem Aug 19 17:10:34 vpm045 kernel: XFS (vdb1): Ending clean mount Aug 19 17:10:35 vpm045 kernel: vdb: vdb1 vdb2 Aug 19 17:10:35 vpm045 systemd: Starting system-ceph\x2ddisk\x2dactivate.slice. Aug 19 17:10:35 vpm045 systemd: Created slice system-ceph\x2ddisk\x2dactivate.slice. Aug 19 17:10:35 vpm045 systemd: Starting Ceph disk activation: /dev/vdb1... Aug 19 17:10:35 vpm045 systemd: Starting Ceph disk journal activation: /dev/vdb2... Aug 19 17:10:35 vpm045 kernel: XFS (vdb1): Mounting Filesystem Aug 19 17:10:35 vpm045 kernel: XFS (vdb1): Ending clean mount ... Aug 19 17:10:36 vpm045 systemd: Reloading. Aug 19 17:10:36 vpm045 systemd: Starting system-ceph\x2dosd.slice. Aug 19 17:10:36 vpm045 systemd: Created slice system-ceph\x2dosd.slice. Aug 19 17:10:36 vpm045 systemd: Starting Ceph object storage daemon... ... Aug 19 17:10:37 vpm045 ceph-osd-prestart.sh: create-or-move updating item name 'osd.0' weight 0.1903 at location {host=vpm045,root=default} to crush map Aug 19 17:10:37 vpm045 ceph-osd-prestart.sh: 2015-08-19 17:10:37.564182 7f7b55ced700 1 -- 10.214.130.45:0/1019701 mark_down 0x7f7b5005b240 -- 0x7f7b50061550 Aug 19 17:10:37 vpm045 ceph-osd-prestart.sh: 2015-08-19 17:10:37.564252 7f7b55ced700 1 -- 10.214.130.45:0/1019701 mark_down_all Aug 19 17:10:37 vpm045 ceph-osd-prestart.sh: 2015-08-19 17:10:37.566511 7f7b55ced700 1 -- 10.214.130.45:0/1019701 shutdown complete. Aug 19 17:10:37 vpm045 systemd: Started Ceph object storage daemon. Aug 19 17:10:37 vpm045 systemd: Started Ceph disk activation: /dev/vdb1. Aug 19 17:10:37 vpm045 ceph-osd: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal Aug 19 17:10:37 vpm045 ceph-disk: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device Aug 19 17:10:37 vpm045 ceph-osd: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device Aug 19 17:10:37 vpm045 ceph-osd: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device Aug 19 17:10:37 vpm045 systemd: Reloading. Aug 19 17:10:37 vpm045 ceph-osd: 2015-08-19 17:10:37.729604 7ffd652cd900 -1 osd.0 0 log_to_monitors {default=true} Aug 19 17:10:37 vpm045 systemd: Started Ceph object storage daemon. Aug 19 17:10:37 vpm045 systemd: Started Ceph disk journal activation: /dev/vdb2. Aug 19 17:10:46 vpm045 ceph-mds: 2015-08-19 17:10:46.699468 7effed73a780 -1 mds.vpm045 *** no OSDs are up as of ep
[ubuntu@vpm045 ~]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/vda1 103144568 2790968 100337216 3% / devtmpfs 885196 0 885196 0% /dev tmpfs 890436 0 890436 0% /dev/shm tmpfs 890436 8528 881908 1% /run tmpfs 890436 0 890436 0% /sys/fs/cgroup /dev/vdb1 204371440 34092 204337348 1% /var/lib/ceph/osd/ceph-0
the VM was destroyed before I could investigate further.
#5 Updated by Loïc Dachary over 8 years ago
Similar failure using just one host http://149.202.170.149:8081/ubuntu-2015-08-19_23:29:05-ceph-disk-master---basic-openstack/26/
#6 Updated by Sage Weil over 8 years ago
Looks like ceph-deploy is triggering the systemd instead of sysvinit path here.
#7 Updated by Loïc Dachary over 8 years ago
Yes, this is related to http://tracker.ceph.com/issues/12786 and was introduced by https://github.com/ceph/ceph-deploy/commit/0fbc7bd57e3fcef2ad700fc2cdf208f6befd999c
#8 Updated by Travis Rhoden over 8 years ago
I was pointed at: http://qa-proxy.ceph.com/teuthology/loic-2015-08-20_01:57:10-ceph-deploy-master---basic-vps/1022951/teuthology.log
In this particular log, it's running 9.0.2 + the latest systemd stuff, so ceph-deploy is staring the daemons with systemd. The whole task appears to be successful, until the end:
2015-08-19T17:11:15.273 INFO:teuthology.orchestra.run.vpm039:Running: 'sudo stop ceph-all || sudo service ceph stop' 2015-08-19T17:11:15.384 INFO:teuthology.orchestra.run.vpm039.stderr:sudo: stop: command not found 2015-08-19T17:11:15.403 INFO:teuthology.orchestra.run.vpm039.stderr:Redirecting to /bin/systemctl stop ceph.service 2015-08-19T17:11:15.404 INFO:teuthology.orchestra.run.vpm039.stderr:Failed to issue method call: Unit ceph.service not loaded.
This is what fails the task.
I didn't think there was a ceph.service anymore? It's ceph.target, right? and then there are individual ceph-mon@ and ceph-osd@ services?
So I think Teuthology needs to be a bit smarter in shutting down on systemd. Maybe something like:
'sudo systemctl stop ceph.target || sudo stop ceph-all || sudo service ceph stop'
Not sure if we can 'stop' ceph.target. might have to do things like ceph-mon@.service, ceph-osd@.service, ceph-mds@.service instead. hoping ceph.target is a toplevel that will take care of everything, but I don't know enough systemd to know if that's how it works.
#9 Updated by Zack Cerza over 8 years ago
Why is it 'ceph.target'?
I'm really hoping that we can implement a top-level systemd service that will allow users to start and stop ceph in a similar way that they have in the past.
#10 Updated by Zack Cerza over 8 years ago
These links are mainly for my own memory :)
https://github.com/ceph/ceph-deploy/commit/81266f79c21ffad272f1a1d397ad0012c2b31a57
https://github.com/ceph/ceph/pull/5446/commits
#11 Updated by Alfredo Deza over 7 years ago
- Status changed from 12 to Closed