Project

General

Profile

Bug #7598

ceph-disk-activate error with ceph-deploy

Added by Sheldon Mustard about 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph-disk-activate throws the following error to ceph-deploy yet it successfully creates the osd, please note this was preceded by a successful "disk zap".

[hosta][DEBUG ] detect platform information from remote host
[hosta][DEBUG ] detect machine type
[ceph_deploy.osd][INFO ] Distro info: Red Hat Enterprise Linux Server 6.4 Santiago
[ceph_deploy.osd][DEBUG ] activating host hosta disk /dev/sdb
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[hosta][INFO ] Running command: ceph-disk-activate --mark-init sysvinit --mount /dev/sdb
[hosta][WARNIN] ceph-disk: Cannot discover filesystem type: device /dev/sdb: Line is truncated:
[hosta][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk-activate --mark-init sysvinit --mount /dev/sdb

History

#1 Updated by Ian Colle about 10 years ago

  • Project changed from Ceph to devops
  • Assignee set to Alfredo Deza

#2 Updated by Alfredo Deza about 10 years ago

  • Status changed from New to Need More Info

I can't replicate this behavior at all. I do know why that is the error that comes about though.

What happens is that ceph-disk tries to mount the device but before it does that it attempts to detect the device file type:

    try:
        fstype = detect_fstype(dev=dev)
    except (subprocess.CalledProcessError,
            TruncatedLineError,
            TooManyLinesError) as e:
        raise FilesystemTypeError(
            'device {dev}'.format(dev=dev),
            e,
            )

That call to detect_fstype fails because the result is not a single line (there is a check for this) so that is why there is a mixed error in the output.

The reason why the error looks like this (without anything readable after the colon) is because the result is empty:

[hosta][WARNIN] ceph-disk: Cannot discover filesystem type: device /dev/sdb: Line is truncated: 

The handler for that error will add the output to the exception, but since there is no output, the exception looks like it is missing a part of the error.

This is the command that detect_fstype is calling:

sudo blkid -p -s TYPE -o value {device}

The biggest problem here is that we are using subprocess.Popen and later we call communicate() which returns a tuple (stdout and stderr) and we only
use stdout and completely discard stderr.

So in the case of blkid, useful error information is spit out to stderr that we discard. See:

In [21]: result = Popen(['sudo', 'blkid', '-p', '-s', 'TYPE', '-o', 'value', '/dev/sdb1'], stdout=PIPE)

In [22]: out, _ = result.communicate()

In [23]: print out

In [24]: print _
None

Not only '_' is None, it is so because we are not even telling subprocess to capture it, this would look like:

In [25]: result = Popen(['sudo', 'blkid', '-p', '-s', 'TYPE', '-o', 'value', '/dev/sdb1'], stdout=PIPE, stderr=PIPE)

In [26]: out, _ = result.communicate()

In [27]: print out

In [28]: print _
error: /dev/sdb1: No such file or directory

AHA! No such file or directory! THIS IS USEFUL OUTPUT.

The only thing I think I can do here is just improve error reporting in ceph-disk, Unless someone can replicate the behavior from the bug report reliably so I can
make the changes to ceph-disk and see what the actual error is.

#3 Updated by Sindhura bandi almost 10 years ago

I faced this issue consistently in my setup.

ubuntu@ip-10-15-16-160:~$ sudo ceph-deploy osd activate ip-10-15-16-160:/dev/xvdc
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ubuntu/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (1.5.1): /usr/bin/ceph-deploy osd activate ip-10-15-16-160:/dev/xvdc
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks ip-10-15-16-160:/dev/xvdc:
[ip-10-15-16-160][DEBUG ] connected to host: ip-10-15-16-160
[ip-10-15-16-160][DEBUG ] detect platform information from remote host
[ip-10-15-16-160][DEBUG ] detect machine type
[ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host ip-10-15-16-160 disk /dev/xvdc
[ceph_deploy.osd][DEBUG ] will use init type: upstart
[ip-10-15-16-160][INFO ] Running command: ceph-disk-activate --mark-init upstart --mount /dev/xvdc
[ip-10-15-16-160][WARNIN] ceph-disk: Cannot discover filesystem type: device /dev/xvdc: Line is truncated:
[ip-10-15-16-160][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk-activate --mark-init upstart --mount /dev/xvdc

This is what I found.
detect_fstype() executes following command.
"sudo /sbin/blkid p -s TYPE -o value - /dev/xvdc"

This command returns nothing.

ubuntu@ip-10-15-16-160:~$ sudo /sbin/blkid p -s TYPE -o value - /dev/xvdc
ubuntu@ip-10-15-16-160:~$

Above command's output is passed through must_be_one_line() which raises TruncatedLineError.

If I run osd activate using partition instead of the disk name, it works.

ubuntu@ip-10-15-16-160:~$ sudo ceph-deploy osd activate ip-10-15-16-160:/dev/xvdc1
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ubuntu/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (1.5.1): /usr/bin/ceph-deploy osd activate ip-10-15-16-160:/dev/xvdc1
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks ip-10-15-16-160:/dev/xvdc1:
[ip-10-15-16-160][DEBUG ] connected to host: ip-10-15-16-160
[ip-10-15-16-160][DEBUG ] detect platform information from remote host
[ip-10-15-16-160][DEBUG ] detect machine type
[ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host ip-10-15-16-160 disk /dev/xvdc1
[ceph_deploy.osd][DEBUG ] will use init type: upstart
[ip-10-15-16-160][INFO ] Running command: ceph-disk-activate --mark-init upstart --mount /dev/xvdc1
[ip-10-15-16-160][WARNIN] INFO:ceph-disk:ceph osd.0 already mounted in position; unmounting ours.
[ip-10-15-16-160][INFO ] checking OSD status...
[ip-10-15-16-160][INFO ] Running command: ceph --cluster=ceph osd stat --format=json
ubuntu@ip-10-15-16-160:~$

#4 Updated by Thanassis Parathyras almost 10 years ago

I came across the same issue.
In my case Centos 6.5 is the operating system. Running Ceph 0.80.1.
What i figured out is that the blkid command doesn't output anything because there is nothing to report.
Let me explain this. In a previous step executing the ceph-deploy osd prepare command, the disk device is being partitioned in the data (sda1, in my case) and journal (sda2).
This made me use the below command # ceph-deploy osd activate storage2:/dev/sda1
which goes further but still concludes with error, as below:

[ceph@ceph-admin ~]$ ceph-deploy osd activate storage2:/dev/sda1
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (1.5.3): /usr/bin/ceph-deploy osd activate storage2:/dev/sda1
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks storage2:/dev/sda1:
[storage2][DEBUG ] connected to host: storage2
[storage2][DEBUG ] detect platform information from remote host
[storage2][DEBUG ] detect machine type
[ceph_deploy.osd][INFO ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] activating host storage2 disk /dev/sda1
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[storage2][INFO ] Running command: sudo ceph-disk-activate --mark-init sysvinit --mount /dev/sda1
[storage2][WARNIN] got monmap epoch 1
[storage2][WARNIN] 2014-06-04 06:16:53.500715 7f4fd61477a0 -1 filestore(/var/lib/ceph/tmp/mnt.EcfPMt) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.EcfPMt/journal: (2) No such file or directory
[storage2][WARNIN] 2014-06-04 06:16:53.500778 7f4fd61477a0 -1 OSD::mkfs: ObjectStore::mkfs failed with error -2
[storage2][WARNIN] 2014-06-04 06:16:53.500930 7f4fd61477a0 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.EcfPMt: (2) No such file or directory
[storage2][WARNIN] ERROR:ceph-disk:Failed to activate
[storage2][WARNIN] Traceback (most recent call last):
[storage2][WARNIN] File "/usr/sbin/ceph-disk", line 2579, in <module>
[storage2][WARNIN] main()
[storage2][WARNIN] File "/usr/sbin/ceph-disk", line 2557, in main
[storage2][WARNIN] args.func(args)
[storage2][WARNIN] File "/usr/sbin/ceph-disk", line 1910, in main_activate
[storage2][WARNIN] init=args.mark_init,
[storage2][WARNIN] File "/usr/sbin/ceph-disk", line 1686, in mount_activate
[storage2][WARNIN] (osd_id, cluster) = activate(path, activate_key_template, init)
[storage2][WARNIN] File "/usr/sbin/ceph-disk", line 1849, in activate
[storage2][WARNIN] keyring=keyring,
[storage2][WARNIN] File "/usr/sbin/ceph-disk", line 1484, in mkfs
[storage2][WARNIN] '--keyring', os.path.join(path, 'keyring'),
[storage2][WARNIN] File "/usr/sbin/ceph-disk", line 303, in command_check_call
[storage2][WARNIN] return subprocess.check_call(arguments)
[storage2][WARNIN] File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call
[storage2][WARNIN] raise CalledProcessError(retcode, cmd)
[storage2][WARNIN] subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', '3', '--monmap', '/var/lib/ceph/tmp/mnt.EcfPMt/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.EcfPMt', '--osd-journal', '/var/lib/ceph/tmp/mnt.EcfPMt/journal', '--osd-uuid', '49497e77-b6bf-430f-b22b-d56e07b2f097', '--keyring', '/var/lib/ceph/tmp/mnt.EcfPMt/keyring']' returned non-zero exit status 1
[storage2][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk-activate --mark-init sysvinit --mount /dev/sda1

Note, that when using just directories to go with my PoC deployment everything goes fine.
Also checked directly the ceph-osd command on the storage node (replacing tmp dirs with existing ones) and got the same error.
I 'm stuck with this error, any other pointers?

Thanassis

#5 Updated by Alfredo Deza almost 10 years ago

  • Status changed from Need More Info to In Progress

#6 Updated by Thanassis Parathyras almost 10 years ago

My issue (as stated above in comment #4) is resolved.
I discovered that i was using a broken ceph.rpm coming from the EPEL repo, instead of the native ceph repo.
You can see a relative email thread at https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10352.html

Thanks for your response.
Thanassis

#7 Updated by Alfredo Deza almost 10 years ago

  • Status changed from In Progress to 4

One of the differences here is that you cannot call 'activate' without doing a 'prepare' first.

This behavior would be expected because there is the chance that a device's file system will not be
able to be detected.

For example:

ssh node3 sudo fdisk -l

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/mapper/vg0-root doesn't contain a valid partition table
Disk /dev/mapper/vg0-swap doesn't contain a valid partition table

Disk /dev/sda: 44.0 GB, 44040192000 bytes
255 heads, 63 sectors/track, 5354 cylinders, total 86016000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0007137a

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1757183      877568   83  Linux
/dev/sda2         1759230    86013951    42127361    5  Extended
/dev/sda5         1759232    86013951    42127360   8e  Linux LVM

Disk /dev/sdb: 8589 MB, 8589934592 bytes
256 heads, 63 sectors/track, 1040 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1    16777215     8388607+  ee  GPT

So in this case, if we call 'blkid' it will fail to detect what the filesystem is:

vagrant@node3:~$ sudo /sbin/blkid p -s TYPE -o value - /dev/sdb
vagrant@node3:~$ echo $?
2

If I insist and try to 'activate' that device, the error comes up:

ceph-deploy osd activate node3:/dev/sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /Users/alfredo/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy osd activate node3:/dev/sdb
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks node3:/dev/sdb:
[node3][DEBUG ] connected to host: node3
[node3][DEBUG ] detect platform information from remote host
[node3][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host node3 disk /dev/sdb
[ceph_deploy.osd][DEBUG ] will use init type: upstart
[node3][INFO  ] Running command: sudo ceph-disk-activate --mark-init upstart --mount /dev/sdb
[node3][WARNIN] ceph-disk: Cannot discover filesystem type: device /dev/sdb: Line is truncated:
[node3][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk-activate --mark-init upstart --mount /dev/sdb

If I call 'create' instead of 'activate' on the whole device, it works:

$ ceph-deploy osd --zap create node3:/dev/sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /Users/alfredo/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy osd --zap create node3:/dev/sdb
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks node3:/dev/sdb:
[node3][DEBUG ] connected to host: node3
[node3][DEBUG ] detect platform information from remote host
[node3][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to node3
[node3][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[node3][INFO  ] Running command: sudo udevadm trigger --subsystem-match=block --action=add
[ceph_deploy.osd][DEBUG ] Preparing host node3 disk /dev/sdb journal None activate True
[node3][INFO  ] Running command: sudo ceph-disk-prepare --zap-disk --fs-type xfs --cluster ceph -- /dev/sdb
[node3][DEBUG ] ****************************************************************************
[node3][DEBUG ] Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
[node3][DEBUG ] verification and recovery are STRONGLY recommended.
[node3][DEBUG ] ****************************************************************************
[node3][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or
[node3][DEBUG ] other utilities.
[node3][DEBUG ] The operation has completed successfully.
[node3][DEBUG ] The operation has completed successfully.
[node3][DEBUG ] The operation has completed successfully.
[node3][DEBUG ] meta-data=/dev/sdb1              isize=2048   agcount=4, agsize=196543 blks
[node3][DEBUG ]          =                       sectsz=512   attr=2, projid32bit=0
[node3][DEBUG ] data     =                       bsize=4096   blocks=786171, imaxpct=25
[node3][DEBUG ]          =                       sunit=0      swidth=0 blks
[node3][DEBUG ] naming   =version 2              bsize=4096   ascii-ci=0
[node3][DEBUG ] log      =internal log           bsize=4096   blocks=2560, version=2
[node3][DEBUG ]          =                       sectsz=512   sunit=0 blks, lazy-count=1
[node3][DEBUG ] realtime =none                   extsz=4096   blocks=0, rtextents=0
[node3][DEBUG ] The operation has completed successfully.
[node3][WARNIN] Caution: invalid backup GPT header, but valid main header; regenerating
[node3][WARNIN] backup header from main header.
[node3][WARNIN]
[node3][WARNIN] Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
[node3][WARNIN] on the recovery & transformation menu to examine the two tables.
[node3][WARNIN]
[node3][WARNIN] Warning! One or more CRCs don't match. You should repair the disk!
[node3][WARNIN]
[node3][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdb
[node3][INFO  ] Running command: sudo udevadm trigger --subsystem-match=block --action=add
[node3][INFO  ] checking OSD status...
[node3][INFO  ] Running command: sudo ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host node3 is now ready for osd use.

The device is ready and mounted:

$ ceph-deploy osd list node3
[ceph_deploy.conf][DEBUG ] found configuration file at: /Users/alfredo/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy osd list node3
[node3][DEBUG ] connected to host: node3
[node3][DEBUG ] detect platform information from remote host
[node3][DEBUG ] detect machine type
[node3][INFO  ] Running command: sudo ceph --cluster=ceph osd tree --format=json
[node3][DEBUG ] connected to host: node3
[node3][DEBUG ] detect platform information from remote host
[node3][DEBUG ] detect machine type
[node3][INFO  ] Running command: sudo ceph-disk list
[node3][INFO  ] ----------------------------------------
[node3][INFO  ] ceph-0
[node3][INFO  ] ----------------------------------------
[node3][INFO  ] Path           /var/lib/ceph/osd/ceph-0
[node3][INFO  ] ID             0
[node3][INFO  ] Name           osd.0
[node3][INFO  ] Status         up
[node3][INFO  ] Reweight       1.000000
[node3][INFO  ] Magic          ceph osd volume v026
[node3][INFO  ] Journal_uuid   980f8734-e5c4-4124-bfbc-deb2ad0d5c8e
[node3][INFO  ] Active         ok
[node3][INFO  ] Device         /dev/sdb1
[node3][INFO  ] Whoami         0
[node3][INFO  ] Journal path   /dev/sdb2
[node3][INFO  ] ----------------------------------------

Can someone confirm that calling 'create' would work? (as opposed to activate on a whole device)

#8 Updated by Sindhura bandi almost 10 years ago

osd create works without any problem. It's only activate that fails.

#9 Updated by Alfredo Deza almost 10 years ago

You cannot activate a device without calling prepare first. So if you are encountering an issue when activating without
having called prepare, that is expected.

#10 Updated by Sage Weil over 9 years ago

  • Status changed from 4 to Can't reproduce

Also available in: Atom PDF