Project

General

Profile

Actions

Bug #13000

closed

ceph-disk failure to activate with dmcrypt

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2015-09-08_17:24:19-ceph-deploy-master---basic-vps/1046931

Timed out waiting for HEALTH_OK.

The osd startup failed with -13 permission denied ... probably because /dev/dm-1 or whatever wasn't owned by ceph:ceph? But the ceph-disk rule for the journal should have done this. Probably a race condition as this has worked before.

Also I notice that ceph-disk isn't logging to syslog properly? misc.log has some but not all output.. i'm guessing it's stderr only. :(


Related issues 3 (0 open3 closed)

Related to Ceph - Bug #12787: ceph-disk does not activate with systemd on some CentOS 7ResolvedLoïc Dachary08/26/2015

Actions
Related to Ceph - Bug #13153: ceph-disk: blkid 2.23.2 fails on journal device on CentOS 7ResolvedLoïc Dachary09/18/2015

Actions
Related to Ceph - Bug #13061: systemd: daemons restart when package is upgradedResolvedBoris Ranto09/11/2015

Actions
Actions #1

Updated by Loïc Dachary over 8 years ago

  • Assignee set to Loïc Dachary
Actions #3

Updated by Loïc Dachary over 8 years ago

Also I notice that ceph-disk isn't logging to syslog properly? misc.log has some but not all output.. i'm guessing it's stderr only. :(

I'm very confused by the logging behavior of ceph-disk in the context of systemd. For some reason the traceback, when any, is in /var/log. But regardless of the debug level I set the log.{info,debug} is nowhere to be found. When I run the same from bash it works as expected. Soemthing, somewhere is eating the logs...

Actions #4

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #5

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #6

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #7

Updated by Loïc Dachary over 8 years ago

  • ceph-disk prepare /dev/vdb

No activation

  • partprobe /dev/vdb

Activates as expected

Actions #8

Updated by Loïc Dachary over 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #9

Updated by Loïc Dachary over 8 years ago

  • Status changed from Fix Under Review to 7
teuthology-suite --priority 101 --machine-type vps --suite ceph-deploy --ceph wip-13000-ceph-disk --email loic@dachary.org
teuthology-suite -N 5 --priority 101 --machine-type vps --filter=dmcrypt --suite ceph-deploy --ceph wip-13000-ceph-disk --email loic@dachary.org
Actions #10

Updated by Loïc Dachary over 8 years ago

ceph-deploy still fails with the same symptoms

Actions #11

Updated by Samuel Just over 8 years ago

  • Status changed from 7 to Fix Under Review
Actions #12

Updated by Loïc Dachary over 8 years ago

  • Status changed from Fix Under Review to In Progress
Actions #13

Updated by Loïc Dachary over 8 years ago

teuthology-suite -N 5 --priority 101 --machine-type vps --filter=dmcrypt --suite ceph-deploy --ceph wip-13000-ceph-disk --email loic@dachary.org

All with the same symptom (one OSD does not activate), all on Ubuntu this this.

Actions #14

Updated by Loïc Dachary over 8 years ago

  • Subject changed from ceph-disk failure to activate on centos7 w/ dmcrypt to ceph-disk failure to activate with dmcrypt

Also happens on Ubuntu (see http://pulpito.ceph.com/loic-2015-09-30_15:58:45-ceph-deploy-infernalis---basic-vps).

teuthology-suite -N 10 --priority 101 --machine-type vps --filter="ceph-deploy/basic/{ceph-deploy-overrides/ceph_deploy_dmcrypt.yaml config_options/cephdeploy_conf.yaml distros/ubuntu_14.04.yaml tasks/ceph-deploy_hello_world.yaml}" --suite ceph-deploy --suite-branch wip-13000-centos-activate --ceph infernalis --email loic@dachary.org
Actions #15

Updated by Loïc Dachary over 8 years ago

http://pulpito.ceph.com/loic-2015-09-30_17:27:30-ceph-deploy-infernalis---basic-vps/1078362/ runs long enough for introspection.

ubuntu@vpm043:/var/log/upstart$ ls -ltr *vdb*
-rw-r----- 1 root root  4897 Sep 30 15:43 ceph-disk-_dev_vdb2_17117.log
-rw-r----- 1 root root  4958 Sep 30 15:43 ceph-disk-_dev_vdb2_17146.log
-rw-r----- 1 root root 15804 Sep 30 15:43 ceph-disk-_dev_vdb1_17308.log
-rw-r----- 1 root root 11153 Sep 30 15:43 ceph-disk-_dev_vdb1_17362.log
-rw-r----- 1 root root  4958 Sep 30 15:43 ceph-disk-_dev_vdb2_17361.log
ubuntu@vpm043:/var/log/upstart$ sudo tail -n 25 ceph-disk-_dev_vdb1*.log
==> ceph-disk-_dev_vdb1_17308.log <==
libust[17499/17499]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
2015-09-30 15:43:56.533721 7f268ff7e980 -1 filestore(/var/lib/ceph/tmp/mnt.SF3Vn3) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.SF3Vn3/journal: (13) Permission denied
2015-09-30 15:43:56.534146 7f268ff7e980 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13
2015-09-30 15:43:56.534590 7f268ff7e980 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.SF3Vn3: (13) Permission denied
ERROR:ceph-disk:Failed to activate
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 3576, in <module>
    main(sys.argv[1:])
  File "/usr/sbin/ceph-disk", line 3532, in main
    main_catch(args.func, args)
  File "/usr/sbin/ceph-disk", line 3554, in main_catch
    func(args)
  File "/usr/sbin/ceph-disk", line 2424, in main_activate
    dmcrypt_key_dir=args.dmcrypt_key_dir,
  File "/usr/sbin/ceph-disk", line 2197, in mount_activate
    (osd_id, cluster) = activate(path, activate_key_template, init)
  File "/usr/sbin/ceph-disk", line 2360, in activate
    keyring=keyring,
  File "/usr/sbin/ceph-disk", line 1950, in mkfs
    '--setgroup', get_ceph_user(),
  File "/usr/sbin/ceph-disk", line 349, in command_check_call
    return subprocess.check_call(arguments)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', '1', '--monmap', '/var/lib/ceph/tmp/mnt.SF3Vn3/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.SF3Vn3', '--osd-journal', '/var/lib/ceph/tmp/mnt.SF3Vn3/journal', '--osd-uuid', 'f5c41028-318e-4ad4-963b-679d17bffb64', '--keyring', '/var/lib/ceph/tmp/mnt.SF3Vn3/keyring', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status 1

==> ceph-disk-_dev_vdb1_17362.log <==
libust[17568/17568]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
2015-09-30 15:43:57.210622 7f111d8e2980 -1 filestore(/var/lib/ceph/tmp/mnt.vEmugd) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.vEmugd/journal: (13) Permission denied
2015-09-30 15:43:57.211055 7f111d8e2980 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13
2015-09-30 15:43:57.211446 7f111d8e2980 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.vEmugd: (13) Permission denied
ERROR:ceph-disk:Failed to activate
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 3576, in <module>
    main(sys.argv[1:])
  File "/usr/sbin/ceph-disk", line 3532, in main
    main_catch(args.func, args)
  File "/usr/sbin/ceph-disk", line 3554, in main_catch
    func(args)
  File "/usr/sbin/ceph-disk", line 2424, in main_activate
    dmcrypt_key_dir=args.dmcrypt_key_dir,
  File "/usr/sbin/ceph-disk", line 2197, in mount_activate
    (osd_id, cluster) = activate(path, activate_key_template, init)
  File "/usr/sbin/ceph-disk", line 2360, in activate
    keyring=keyring,
  File "/usr/sbin/ceph-disk", line 1950, in mkfs
    '--setgroup', get_ceph_user(),
  File "/usr/sbin/ceph-disk", line 349, in command_check_call
    return subprocess.check_call(arguments)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', '1', '--monmap', '/var/lib/ceph/tmp/mnt.vEmugd/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.vEmugd', '--osd-journal', '/var/lib/ceph/tmp/mnt.vEmugd/journal', '--osd-uuid', 'f5c41028-318e-4ad4-963b-679d17bffb64', '--keyring', '/var/lib/ceph/tmp/mnt.vEmugd/keyring', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status 1
ubuntu@vpm043:/var/log/upstart$ ls -l /dev/vdb?
brw-rw---- 1 ceph ceph 253, 17 Sep 30 15:43 /dev/vdb1
brw-rw---- 1 ceph ceph 253, 18 Sep 30 15:43 /dev/vdb2

It looks like the event related to vdb2 happened after the event for vdb1 and changed permission while activation was ongoing. The vdb2 action then changed it back to ceph:ceph but too late. It probably is a consequence of the broad udevadm trigger action=add run by ceph-deploy.

Actions #16

Updated by Loïc Dachary over 8 years ago

teuthology-suite -N 10 --priority 101 --machine-type vps --filter="ceph-deploy/basic/{ceph-deploy-overrides/ceph_deploy_dmcrypt.yaml config_options/cephdeploy_conf.yaml distros/ubuntu_14.04.yaml tasks/ceph-deploy_hello_world.yaml}" --suite ceph-deploy --suite-branch wip-13000-centos-activate --ceph infernalis --email loic@dachary.org
teuthology-suite --priority 101 --machine-type vps --suite ceph-deploy --suite-branch wip-13000-centos-activate --ceph infernalis --email loic@dachary.org
Actions #17

Updated by Loïc Dachary over 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #18

Updated by Sage Weil over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF