Project

General

Profile

Actions

Bug #4936

closed

ceph-deploy fails to report errors

Added by hakan ardo almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,
I like the ceph-deploy script, but it can be very confusing for new users when things go wrong. I spend my first day with ceph getting a quite bad first impression. The issues I ran into (as far as I can remember) was:

If I do "ceph-deploy osd create ceph-server:/testdisk" with ceph-server being a Debian wheezy server, the osd will fail to start since /usr/lib/ceph/ceph_common.sh does not support symbolik links in /var/lib/ceph/osd/. The problem can be solved by adding -L to the find command in get_local_daemon_list in ceph_common.sh. However the bigger issue is that ceph-deploy does not propagate the error message (and I'm sure there are other things that can go wrong) from ceph-disk-activate but exits normaly giving you the impression that the command was successful.

If I do "ceph-deploy osd create ceph-server:/dev/sdk" on the same server, ceph-disk-activate will hang on the server (presumable waiting for /dev/disk/by-partuuid/... which will never show up). After a while ceph-deploy times out and exits, agian without any error message. Also it does not kill the remote disk-activate. This means that any subsequent "ceph-deploy osd ..." will hang waiting for a lock acquired by the first one, and then time out it the same fashion, leaving more and more processes hanging on the server side.

Also, if I get tired of waiting and press ctrl-C, this will only kill the ceph-deploy script and not the command running on the server leading to similar issues.

I think it would be helpful when debugging/exploring/learning this kind of issues if the -v switch to ceph-deploy would print the remote commands executed and their full output.

Following the Quick Start tutorial at http://ceph.com/docs/master/start/, when you get to the CEPHFS section, the mount command suggested there will fail with an illegal argument error. You need to add "-o name=admin,secret=<key>" with the <key> from my-cluster/ceph.client.admin.keyring. It would be nice if that page mentioned that.

Actions

Also available in: Atom PDF