Project

General

Profile

Bug #12726

ceph-deploy suite fails on CentOS 7

Added by Loïc Dachary over 8 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/loic-2015-08-18_23:30:39-ceph-deploy-master---basic-vps/1020940/

2015-08-18T14:41:11.490 INFO:tasks.ceph_deploy:Ceph health: HEALTH_ERR 64 pgs stuck inactive; 64 pgs stuck unclean; no osds
2015-08-18T14:41:21.490 INFO:teuthology.orchestra.run.vpm084:Running: 'cd /home/ubuntu/cephtest && sudo ceph health'

It looks like ceph-deploy gives up after preparing /dev/vdb does not try to prepare / activate /dev/vdc


Related issues

Related to Ceph-deploy - Bug #12786: ceph-deploy disagrees with ceph on the centos init system Resolved 08/26/2015
Related to Ceph-deploy - Bug #12557: systemd: osd create should enable ceph.target Resolved 07/31/2015

History

#1 Updated by Loïc Dachary over 8 years ago

teuthology-suite -l1 --verbose --suite ceph-deploy --suite-branch master --email --filter=centos_7 --ceph next --machine-type vps

#2 Updated by Loïc Dachary over 8 years ago

  • Priority changed from Normal to Urgent

It looks like CentOS 7 was added late july 2015 http://pulpito.ceph.com/teuthology-2015-07-19_02:10:02-ceph-deploy-next-distro-basic-vps/ and was never successfully run.

#3 Updated by Loïc Dachary over 8 years ago

Fail without dmcrypt http://pulpito.ceph.com/loic-2015-08-20_01:57:10-ceph-deploy-master---basic-vps/1022951/

teuthology-suite --priority 50 --filter='ceph-deploy/basic/{ceph-deploy-overrides/disable_diff_journal_disk.yaml config_options/cephdeploy_conf.yaml distros/centos_7.0.yaml tasks/ceph-deploy_hello_world.yaml}' --verbose --suite ceph-deploy --suite-branch master --email loic@dachary.org --ceph master --machine-type vps

The OSD on one machine was properly mounted.

#4 Updated by Loïc Dachary over 8 years ago

Caught this before a VM was shutdown.

Aug 19 17:10:30 vpm045 kernel: vdb:
Aug 19 17:10:31 vpm045 kernel: vdb: vdb2
Aug 19 17:10:31 vpm045 systemd: Starting system-ceph\x2ddisk\x2dactivate\x2djournal.slice.
Aug 19 17:10:31 vpm045 systemd: Created slice system-ceph\x2ddisk\x2dactivate\x2djournal.slice.
Aug 19 17:10:31 vpm045 systemd: Starting Ceph disk journal activation: /dev/vdb2...
Aug 19 17:10:31 vpm045 ceph-disk: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Aug 19 17:10:31 vpm045 ceph-disk: error: /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: No such file or directory
Aug 19 17:10:31 vpm045 ceph-disk: ceph-disk: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2
Aug 19 17:10:31 vpm045 systemd: ceph-disk-activate-journal@-dev-vdb2.service: main process exited, code=exited, status=1/FAILURE
Aug 19 17:10:31 vpm045 systemd: Failed to start Ceph disk journal activation: /dev/vdb2.
Aug 19 17:10:31 vpm045 systemd: Unit ceph-disk-activate-journal@-dev-vdb2.service entered failed state.
Aug 19 17:10:32 vpm045 kernel: vdb: vdb1 vdb2
Aug 19 17:10:32 vpm045 systemd: Starting Ceph disk journal activation: /dev/vdb2...
Aug 19 17:10:32 vpm045 ceph-disk: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Aug 19 17:10:32 vpm045 ceph-disk: error: /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: No such file or directory
Aug 19 17:10:32 vpm045 ceph-disk: ceph-disk: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2
Aug 19 17:10:32 vpm045 systemd: ceph-disk-activate-journal@-dev-vdb2.service: main process exited, code=exited, status=1/FAILURE
Aug 19 17:10:32 vpm045 systemd: Failed to start Ceph disk journal activation: /dev/vdb2.
Aug 19 17:10:32 vpm045 systemd: Unit ceph-disk-activate-journal@-dev-vdb2.service entered failed state.
Aug 19 17:10:34 vpm045 kernel: SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
Aug 19 17:10:34 vpm045 kernel: XFS (vdb1): Mounting Filesystem
Aug 19 17:10:34 vpm045 kernel: XFS (vdb1): Ending clean mount
Aug 19 17:10:35 vpm045 kernel: vdb: vdb1 vdb2
Aug 19 17:10:35 vpm045 systemd: Starting system-ceph\x2ddisk\x2dactivate.slice.
Aug 19 17:10:35 vpm045 systemd: Created slice system-ceph\x2ddisk\x2dactivate.slice.
Aug 19 17:10:35 vpm045 systemd: Starting Ceph disk activation: /dev/vdb1...
Aug 19 17:10:35 vpm045 systemd: Starting Ceph disk journal activation: /dev/vdb2...
Aug 19 17:10:35 vpm045 kernel: XFS (vdb1): Mounting Filesystem
Aug 19 17:10:35 vpm045 kernel: XFS (vdb1): Ending clean mount
...
Aug 19 17:10:36 vpm045 systemd: Reloading.
Aug 19 17:10:36 vpm045 systemd: Starting system-ceph\x2dosd.slice.
Aug 19 17:10:36 vpm045 systemd: Created slice system-ceph\x2dosd.slice.
Aug 19 17:10:36 vpm045 systemd: Starting Ceph object storage daemon...
...
Aug 19 17:10:37 vpm045 ceph-osd-prestart.sh: create-or-move updating item name 'osd.0' weight 0.1903 at location {host=vpm045,root=default} to crush map
Aug 19 17:10:37 vpm045 ceph-osd-prestart.sh: 2015-08-19 17:10:37.564182 7f7b55ced700  1 -- 10.214.130.45:0/1019701 mark_down 0x7f7b5005b240 -- 0x7f7b50061550
Aug 19 17:10:37 vpm045 ceph-osd-prestart.sh: 2015-08-19 17:10:37.564252 7f7b55ced700  1 -- 10.214.130.45:0/1019701 mark_down_all
Aug 19 17:10:37 vpm045 ceph-osd-prestart.sh: 2015-08-19 17:10:37.566511 7f7b55ced700  1 -- 10.214.130.45:0/1019701 shutdown complete.
Aug 19 17:10:37 vpm045 systemd: Started Ceph object storage daemon.
Aug 19 17:10:37 vpm045 systemd: Started Ceph disk activation: /dev/vdb1.
Aug 19 17:10:37 vpm045 ceph-osd: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Aug 19 17:10:37 vpm045 ceph-disk: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Aug 19 17:10:37 vpm045 ceph-osd: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Aug 19 17:10:37 vpm045 ceph-osd: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Aug 19 17:10:37 vpm045 systemd: Reloading.
Aug 19 17:10:37 vpm045 ceph-osd: 2015-08-19 17:10:37.729604 7ffd652cd900 -1 osd.0 0 log_to_monitors {default=true}
Aug 19 17:10:37 vpm045 systemd: Started Ceph object storage daemon.
Aug 19 17:10:37 vpm045 systemd: Started Ceph disk journal activation: /dev/vdb2.
Aug 19 17:10:46 vpm045 ceph-mds: 2015-08-19 17:10:46.699468 7effed73a780 -1 mds.vpm045 *** no OSDs are up as of ep

[ubuntu@vpm045 ~]$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/vda1      103144568 2790968 100337216   3% /
devtmpfs          885196       0    885196   0% /dev
tmpfs             890436       0    890436   0% /dev/shm
tmpfs             890436    8528    881908   1% /run
tmpfs             890436       0    890436   0% /sys/fs/cgroup
/dev/vdb1      204371440   34092 204337348   1% /var/lib/ceph/osd/ceph-0

the VM was destroyed before I could investigate further.

#6 Updated by Sage Weil over 8 years ago

Looks like ceph-deploy is triggering the systemd instead of sysvinit path here.

#8 Updated by Travis Rhoden over 8 years ago

I was pointed at: http://qa-proxy.ceph.com/teuthology/loic-2015-08-20_01:57:10-ceph-deploy-master---basic-vps/1022951/teuthology.log

In this particular log, it's running 9.0.2 + the latest systemd stuff, so ceph-deploy is staring the daemons with systemd. The whole task appears to be successful, until the end:

2015-08-19T17:11:15.273 INFO:teuthology.orchestra.run.vpm039:Running: 'sudo stop ceph-all || sudo service ceph stop'
2015-08-19T17:11:15.384 INFO:teuthology.orchestra.run.vpm039.stderr:sudo: stop: command not found
2015-08-19T17:11:15.403 INFO:teuthology.orchestra.run.vpm039.stderr:Redirecting to /bin/systemctl stop  ceph.service
2015-08-19T17:11:15.404 INFO:teuthology.orchestra.run.vpm039.stderr:Failed to issue method call: Unit ceph.service not loaded.

This is what fails the task.

I didn't think there was a ceph.service anymore? It's ceph.target, right? and then there are individual ceph-mon@ and ceph-osd@ services?

So I think Teuthology needs to be a bit smarter in shutting down on systemd. Maybe something like:

'sudo systemctl stop ceph.target || sudo stop ceph-all || sudo service ceph stop'

Not sure if we can 'stop' ceph.target. might have to do things like ceph-mon@.service, ceph-osd@.service, ceph-mds@.service instead. hoping ceph.target is a toplevel that will take care of everything, but I don't know enough systemd to know if that's how it works.

#9 Updated by Zack Cerza over 8 years ago

Why is it 'ceph.target'?

I'm really hoping that we can implement a top-level systemd service that will allow users to start and stop ceph in a similar way that they have in the past.

#11 Updated by Alfredo Deza over 7 years ago

  • Status changed from 12 to Closed

Also available in: Atom PDF