Bug #5495
closed
ceph-mon and minus character in hostname
Added by Robert Sander almost 11 years ago.
Updated almost 11 years ago.
Description
It looks like ceph-mon does not cope with a - in the hostname:
- /usr/bin/ceph-mon --cluster=office -i test-uplink-mesh
[2584]: (33) Numerical argument out of domain
- /usr/bin/ceph-mon --cluster=office -i testuplinkmesh
The second invocation does not print out the error, but also does not start the mon. But that may have another reason.
Files
- Assignee set to Sage Weil
- Priority changed from Normal to Urgent
- Status changed from New to Need More Info
what versino is this?
can you strace -f ceph-mon and attach that output? that'll give a better hint as to where things are going wrong..
Sage Weil wrote:
what version is this?
This are the official 0.61.4 packages from ceph.com
can you strace -f ceph-mon and attach that output? that'll give a better hint as to where things are going wrong..
I am sorry but I already purged the last installation.
- Status changed from Need More Info to Can't reproduce
A user was able to reproduce this reliably enough to get an strace out of it. Attached.
Forgot to mention that this user was attempting to upgrade from bobtail to cuttlefish.
that second strace shows it hitting an unrelated assert on db->create_and_open()... joao?
- Status changed from 4 to Need More Info
Sage Weil wrote:
that second strace shows it hitting an unrelated assert on db->create_and_open()... joao?
We've seen that happening on these guys cluster too while trying to upgrade. I've tried to reproduce it to no avail.
Looks like LevelDB's Open() was being unable to lock store.db/LOCK, stating it was already locked. There was no other monitor's running on that machine though, and lsof didn't report anything holding a lock to that file. Is it worth it to open a bug for this scenario, moving the strace there, and automatically mark it as Can't Reproduce?
- Status changed from Need More Info to Can't reproduce
Joao Luis wrote:
Sage Weil wrote:
that second strace shows it hitting an unrelated assert on db->create_and_open()... joao?
We've seen that happening on these guys cluster too while trying to upgrade. I've tried to reproduce it to no avail.
Looks like LevelDB's Open() was being unable to lock store.db/LOCK, stating it was already locked. There was no other monitor's running on that machine though, and lsof didn't report anything holding a lock to that file. Is it worth it to open a bug for this scenario, moving the strace there, and automatically mark it as Can't Reproduce?
yeah. closing this one as can't reproduce.. i'm able to do dashes just fine. Robert, if this problem persists, let us know!
Also available in: Atom
PDF