Project

General

Profile

Actions

Bug #2913

closed

monclient: asserts when no monitor addresses found due to dns failure

Added by Josh Durgin over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
common
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This should be an error returned up to the user, not an assert.

From https://www.redhat.com/archives/libvirt-users/2012-August/msg00028.html:

The issue was the lack of the auth element.

qemu has access to /etc/ceph/ceph.conf. Specifying the host elements in the source element caused a crash:

error: Failed to start domain test0
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/3
server name not found: thinkmate3:6789;thinkmate4:6789 (No such file or directory)
unable to parse addrs in 'thinkmate3:6789;thinkmate4:6789'
mon/MonClient.cc: In function 'void MonClient::_pick_new_mon()' thread 7fd1de5d67c0 time 2012-08-06 15:40:38.191920
mon/MonClient.cc: 424: FAILED assert(monmap.size() > 0)
 ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
 1: (MonClient::_pick_new_mon()+0x40a) [0x7fd1dd6f4e8a]
 2: (MonClient::_reopen_session()+0x187) [0x7fd1dd6f8c47]
 3: (MonClient::authenticate(double)+0x1aa) [0x7fd1dd6f9fea]
 4: (librados::RadosClient::connect()+0xb6c) [0x7fd1dd63d6ec]
 5: (()+0x94393) [0x7fd1de698393]
 6: (()+0x801e5) [0x7fd1de6841e5]
 7: (()+0x80366) [0x7fd1de684366]
 8: (()+0x802d9) [0x7fd1de6842d9]
 9: (()+0x80ee7) [0x7fd1de684ee7]
 10: (()+0xa3b03) [0x7fd1de6a7b03]
 11: (()+0xfec4b) [0x7fd1de702c4b]
 12: (()+0x11fbc2
Actions #1

Updated by Jeff Strunk over 11 years ago

I'm not so sure this is a DNS issue. Here is how name service is set up on my ceph/kvm test cluster.

On each node, /etc/hosts gives the internal IP address for the short hostname and FQDN of each node. DNS will resolve to the public IP address only for the FQDN.

I tested this with the FQDNs of the monitors, but I got the same assert.

I tested this again with only one host line, and it was successful.

Actions #2

Updated by Josh Durgin over 11 years ago

hmm, looking closer that's a second bug - it's not splitting 'thinkmate3:6789;thinkmate4:6789' into separate addresses. Which ceph version are you using?

Actions #3

Updated by Jeff Strunk over 11 years ago

I am using 0.48argonaut-1precise.

Actions #4

Updated by Josh Durgin over 11 years ago

  • Category set to common
  • Status changed from New to Resolved
  • Assignee set to Josh Durgin

Fortunately I was wrong about the string splitting - that was just a confusing message from the parsing stage.

The actual problem was that one of the parsing stages was returning success after it failed, so you only saw the problem in the assert later. This is fixed in the next branch.

Actions

Also available in: Atom PDF