Project

General

Profile

Actions

Bug #4936

closed

ceph-deploy fails to report errors

Added by hakan ardo almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,
I like the ceph-deploy script, but it can be very confusing for new users when things go wrong. I spend my first day with ceph getting a quite bad first impression. The issues I ran into (as far as I can remember) was:

If I do "ceph-deploy osd create ceph-server:/testdisk" with ceph-server being a Debian wheezy server, the osd will fail to start since /usr/lib/ceph/ceph_common.sh does not support symbolik links in /var/lib/ceph/osd/. The problem can be solved by adding -L to the find command in get_local_daemon_list in ceph_common.sh. However the bigger issue is that ceph-deploy does not propagate the error message (and I'm sure there are other things that can go wrong) from ceph-disk-activate but exits normaly giving you the impression that the command was successful.

If I do "ceph-deploy osd create ceph-server:/dev/sdk" on the same server, ceph-disk-activate will hang on the server (presumable waiting for /dev/disk/by-partuuid/... which will never show up). After a while ceph-deploy times out and exits, agian without any error message. Also it does not kill the remote disk-activate. This means that any subsequent "ceph-deploy osd ..." will hang waiting for a lock acquired by the first one, and then time out it the same fashion, leaving more and more processes hanging on the server side.

Also, if I get tired of waiting and press ctrl-C, this will only kill the ceph-deploy script and not the command running on the server leading to similar issues.

I think it would be helpful when debugging/exploring/learning this kind of issues if the -v switch to ceph-deploy would print the remote commands executed and their full output.

Following the Quick Start tutorial at http://ceph.com/docs/master/start/, when you get to the CEPHFS section, the mount command suggested there will fail with an illegal argument error. You need to add "-o name=admin,secret=<key>" with the <key> from my-cluster/ceph.client.admin.keyring. It would be nice if that page mentioned that.

Actions #1

Updated by Ian Colle almost 11 years ago

  • Target version deleted (v0.61 - Cuttlefish)
Actions #2

Updated by Anonymous almost 11 years ago

  • Priority changed from Normal to High
Actions #3

Updated by Dan Mick almost 11 years ago

  • Project changed from Ceph to devops
Actions #4

Updated by Sage Weil almost 11 years ago

  • Assignee set to Anonymous
Actions #5

Updated by Ian Colle almost 11 years ago

change default behavior of ceph-deploy to be -v

Actions #6

Updated by Alfredo Deza over 10 years ago

  • Status changed from New to Resolved

I am closing this as the new logging features will log verbosely to the terminal by default.

Errors should be more apparent now although specific errors on remote actions are still being worked on.

Actions

Also available in: Atom PDF