Project

General

Profile

Actions

Bug #5495

closed

ceph-mon and minus character in hostname

Added by Robert Sander almost 11 years ago. Updated almost 11 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It looks like ceph-mon does not cope with a - in the hostname:

  1. /usr/bin/ceph-mon --cluster=office -i test-uplink-mesh
    [2584]: (33) Numerical argument out of domain
  2. /usr/bin/ceph-mon --cluster=office -i testuplinkmesh

The second invocation does not print out the error, but also does not start the mon. But that may have another reason.


Files

error-numerical-value-out-of-domain.txt (64.3 KB) error-numerical-value-out-of-domain.txt Joao Eduardo Luis, 07/11/2013 11:40 AM
Actions #1

Updated by Sage Weil almost 11 years ago

  • Assignee set to Sage Weil
  • Priority changed from Normal to Urgent
Actions #2

Updated by Sage Weil almost 11 years ago

  • Status changed from New to Need More Info

what versino is this?

can you strace -f ceph-mon and attach that output? that'll give a better hint as to where things are going wrong..

Actions #3

Updated by Robert Sander almost 11 years ago

Sage Weil wrote:

what version is this?

This are the official 0.61.4 packages from ceph.com

can you strace -f ceph-mon and attach that output? that'll give a better hint as to where things are going wrong..

I am sorry but I already purged the last installation.

Actions #4

Updated by Sage Weil almost 11 years ago

  • Status changed from Need More Info to Can't reproduce
Actions #5

Updated by Joao Eduardo Luis almost 11 years ago

A user was able to reproduce this reliably enough to get an strace out of it. Attached.

Actions #6

Updated by Joao Eduardo Luis almost 11 years ago

Forgot to mention that this user was attempting to upgrade from bobtail to cuttlefish.

Actions #7

Updated by Sage Weil almost 11 years ago

that second strace shows it hitting an unrelated assert on db->create_and_open()... joao?

Actions #8

Updated by Sage Weil almost 11 years ago

  • Status changed from 4 to Need More Info
Actions #9

Updated by Joao Eduardo Luis almost 11 years ago

Sage Weil wrote:

that second strace shows it hitting an unrelated assert on db->create_and_open()... joao?

We've seen that happening on these guys cluster too while trying to upgrade. I've tried to reproduce it to no avail.

Looks like LevelDB's Open() was being unable to lock store.db/LOCK, stating it was already locked. There was no other monitor's running on that machine though, and lsof didn't report anything holding a lock to that file. Is it worth it to open a bug for this scenario, moving the strace there, and automatically mark it as Can't Reproduce?

Actions #10

Updated by Sage Weil almost 11 years ago

  • Status changed from Need More Info to Can't reproduce

Joao Luis wrote:

Sage Weil wrote:

that second strace shows it hitting an unrelated assert on db->create_and_open()... joao?

We've seen that happening on these guys cluster too while trying to upgrade. I've tried to reproduce it to no avail.

Looks like LevelDB's Open() was being unable to lock store.db/LOCK, stating it was already locked. There was no other monitor's running on that machine though, and lsof didn't report anything holding a lock to that file. Is it worth it to open a bug for this scenario, moving the strace there, and automatically mark it as Can't Reproduce?

yeah. closing this one as can't reproduce.. i'm able to do dashes just fine. Robert, if this problem persists, let us know!

Actions

Also available in: Atom PDF