Bug #13000
closedceph-disk failure to activate with dmcrypt
0%
Description
/a/sage-2015-09-08_17:24:19-ceph-deploy-master---basic-vps/1046931
Timed out waiting for HEALTH_OK.
The osd startup failed with -13 permission denied ... probably because /dev/dm-1 or whatever wasn't owned by ceph:ceph? But the ceph-disk rule for the journal should have done this. Probably a race condition as this has worked before.
Also I notice that ceph-disk isn't logging to syslog properly? misc.log has some but not all output.. i'm guessing it's stderr only. :(
Updated by Loïc Dachary over 8 years ago
- Status changed from New to In Progress
Updated by Loïc Dachary over 8 years ago
Also I notice that ceph-disk isn't logging to syslog properly? misc.log has some but not all output.. i'm guessing it's stderr only. :(
I'm very confused by the logging behavior of ceph-disk in the context of systemd. For some reason the traceback, when any, is in /var/log. But regardless of the debug level I set the log.{info,debug} is nowhere to be found. When I run the same from bash it works as expected. Soemthing, somewhere is eating the logs...
Updated by Loïc Dachary over 8 years ago
- ceph-disk prepare /dev/vdb
No activation
- partprobe /dev/vdb
Activates as expected
Updated by Loïc Dachary over 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Loïc Dachary over 8 years ago
- Status changed from Fix Under Review to 7
teuthology-suite --priority 101 --machine-type vps --suite ceph-deploy --ceph wip-13000-ceph-disk --email loic@dachary.org
- fail http://pulpito.ceph.com/loic-2015-09-21_12:01:05-ceph-deploy-wip-13000-ceph-disk---basic-vps/ (ubuntu 14.04 dmcrypt failure)
- pass http://pulpito.ceph.com/loic-2015-09-21_16:55:19-ceph-deploy-wip-13000-ceph-disk---basic-vps/
- pass http://167.114.225.100:8081/ubuntu-2015-09-21_15:10:44-ceph-deploy-wip-13000-ceph-disk---basic-openstack/
- dead http://pulpito.ceph.com/loic-2015-09-21_17:17:19-ceph-deploy-wip-13000-ceph-disk---basic-vps/ (environmental job)
teuthology-suite -N 5 --priority 101 --machine-type vps --filter=dmcrypt --suite ceph-deploy --ceph wip-13000-ceph-disk --email loic@dachary.org
Updated by Loïc Dachary over 8 years ago
ceph-deploy still fails with the same symptoms
Updated by Samuel Just over 8 years ago
- Status changed from 7 to Fix Under Review
Updated by Loïc Dachary over 8 years ago
- Status changed from Fix Under Review to In Progress
Updated by Loïc Dachary over 8 years ago
teuthology-suite -N 5 --priority 101 --machine-type vps --filter=dmcrypt --suite ceph-deploy --ceph wip-13000-ceph-disk --email loic@dachary.org
All with the same symptom (one OSD does not activate), all on Ubuntu this this.
Updated by Loïc Dachary over 8 years ago
- Subject changed from ceph-disk failure to activate on centos7 w/ dmcrypt to ceph-disk failure to activate with dmcrypt
Also happens on Ubuntu (see http://pulpito.ceph.com/loic-2015-09-30_15:58:45-ceph-deploy-infernalis---basic-vps).
teuthology-suite -N 10 --priority 101 --machine-type vps --filter="ceph-deploy/basic/{ceph-deploy-overrides/ceph_deploy_dmcrypt.yaml config_options/cephdeploy_conf.yaml distros/ubuntu_14.04.yaml tasks/ceph-deploy_hello_world.yaml}" --suite ceph-deploy --suite-branch wip-13000-centos-activate --ceph infernalis --email loic@dachary.org
Updated by Loïc Dachary over 8 years ago
http://pulpito.ceph.com/loic-2015-09-30_17:27:30-ceph-deploy-infernalis---basic-vps/1078362/ runs long enough for introspection.
ubuntu@vpm043:/var/log/upstart$ ls -ltr *vdb* -rw-r----- 1 root root 4897 Sep 30 15:43 ceph-disk-_dev_vdb2_17117.log -rw-r----- 1 root root 4958 Sep 30 15:43 ceph-disk-_dev_vdb2_17146.log -rw-r----- 1 root root 15804 Sep 30 15:43 ceph-disk-_dev_vdb1_17308.log -rw-r----- 1 root root 11153 Sep 30 15:43 ceph-disk-_dev_vdb1_17362.log -rw-r----- 1 root root 4958 Sep 30 15:43 ceph-disk-_dev_vdb2_17361.log ubuntu@vpm043:/var/log/upstart$ sudo tail -n 25 ceph-disk-_dev_vdb1*.log ==> ceph-disk-_dev_vdb1_17308.log <== libust[17499/17499]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) 2015-09-30 15:43:56.533721 7f268ff7e980 -1 filestore(/var/lib/ceph/tmp/mnt.SF3Vn3) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.SF3Vn3/journal: (13) Permission denied 2015-09-30 15:43:56.534146 7f268ff7e980 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13 2015-09-30 15:43:56.534590 7f268ff7e980 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.SF3Vn3: (13) Permission denied ERROR:ceph-disk:Failed to activate Traceback (most recent call last): File "/usr/sbin/ceph-disk", line 3576, in <module> main(sys.argv[1:]) File "/usr/sbin/ceph-disk", line 3532, in main main_catch(args.func, args) File "/usr/sbin/ceph-disk", line 3554, in main_catch func(args) File "/usr/sbin/ceph-disk", line 2424, in main_activate dmcrypt_key_dir=args.dmcrypt_key_dir, File "/usr/sbin/ceph-disk", line 2197, in mount_activate (osd_id, cluster) = activate(path, activate_key_template, init) File "/usr/sbin/ceph-disk", line 2360, in activate keyring=keyring, File "/usr/sbin/ceph-disk", line 1950, in mkfs '--setgroup', get_ceph_user(), File "/usr/sbin/ceph-disk", line 349, in command_check_call return subprocess.check_call(arguments) File "/usr/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', '1', '--monmap', '/var/lib/ceph/tmp/mnt.SF3Vn3/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.SF3Vn3', '--osd-journal', '/var/lib/ceph/tmp/mnt.SF3Vn3/journal', '--osd-uuid', 'f5c41028-318e-4ad4-963b-679d17bffb64', '--keyring', '/var/lib/ceph/tmp/mnt.SF3Vn3/keyring', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status 1 ==> ceph-disk-_dev_vdb1_17362.log <== libust[17568/17568]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) 2015-09-30 15:43:57.210622 7f111d8e2980 -1 filestore(/var/lib/ceph/tmp/mnt.vEmugd) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.vEmugd/journal: (13) Permission denied 2015-09-30 15:43:57.211055 7f111d8e2980 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13 2015-09-30 15:43:57.211446 7f111d8e2980 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.vEmugd: (13) Permission denied ERROR:ceph-disk:Failed to activate Traceback (most recent call last): File "/usr/sbin/ceph-disk", line 3576, in <module> main(sys.argv[1:]) File "/usr/sbin/ceph-disk", line 3532, in main main_catch(args.func, args) File "/usr/sbin/ceph-disk", line 3554, in main_catch func(args) File "/usr/sbin/ceph-disk", line 2424, in main_activate dmcrypt_key_dir=args.dmcrypt_key_dir, File "/usr/sbin/ceph-disk", line 2197, in mount_activate (osd_id, cluster) = activate(path, activate_key_template, init) File "/usr/sbin/ceph-disk", line 2360, in activate keyring=keyring, File "/usr/sbin/ceph-disk", line 1950, in mkfs '--setgroup', get_ceph_user(), File "/usr/sbin/ceph-disk", line 349, in command_check_call return subprocess.check_call(arguments) File "/usr/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', '1', '--monmap', '/var/lib/ceph/tmp/mnt.vEmugd/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.vEmugd', '--osd-journal', '/var/lib/ceph/tmp/mnt.vEmugd/journal', '--osd-uuid', 'f5c41028-318e-4ad4-963b-679d17bffb64', '--keyring', '/var/lib/ceph/tmp/mnt.vEmugd/keyring', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status 1 ubuntu@vpm043:/var/log/upstart$ ls -l /dev/vdb? brw-rw---- 1 ceph ceph 253, 17 Sep 30 15:43 /dev/vdb1 brw-rw---- 1 ceph ceph 253, 18 Sep 30 15:43 /dev/vdb2
It looks like the event related to vdb2 happened after the event for vdb1 and changed permission while activation was ongoing. The vdb2 action then changed it back to ceph:ceph but too late. It probably is a consequence of the broad udevadm trigger action=add run by ceph-deploy.
Updated by Loïc Dachary over 8 years ago
teuthology-suite -N 10 --priority 101 --machine-type vps --filter="ceph-deploy/basic/{ceph-deploy-overrides/ceph_deploy_dmcrypt.yaml config_options/cephdeploy_conf.yaml distros/ubuntu_14.04.yaml tasks/ceph-deploy_hello_world.yaml}" --suite ceph-deploy --suite-branch wip-13000-centos-activate --ceph infernalis --email loic@dachary.org
- pass http://pulpito.ceph.com/loic-2015-09-30_20:58:38-ceph-deploy-infernalis---basic-vps/
- pass http://pulpito.ceph.com/loic-2015-09-30_23:46:35-ceph-deploy-infernalis---basic-vps/
teuthology-suite --priority 101 --machine-type vps --suite ceph-deploy --suite-branch wip-13000-centos-activate --ceph infernalis --email loic@dachary.org
- pass http://pulpito.ceph.com/loic-2015-09-30_23:47:33-ceph-deploy-infernalis---basic-vps/
- one failure due to skewed clock
Updated by Loïc Dachary over 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil over 8 years ago
- Status changed from Fix Under Review to Resolved