Project

General

Profile

Actions

Bug #18265

closed

MonMap::sanitize_mons() does not like monmaps built from -m/mon_hosts

Added by Dan Mick over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Joao Eduardo Luis
Category:
MonClient
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a ceph.conf that I've used for some time that declares mons with

mon_initial_members = host1, host2, host3
mon host = <host1ip>, <host2ip>, <host3ip>

It seems to me that this is a valid configuration.

Lately, this has started failing with an assert:

FAILED assert(mon_info.count(p.first))

This is in sanitize_mons(), because it's trying to verify that an incoming monmap matches the initial monmap by, among other things, asserting that the map key in the incoming map is present in 'mon_info', which is the initially-built monmap.

The problem is that the initially-built map, because it was built from ip addresses, has 'noname-{a,b,c}' in the keys, so there's no match, so the assert fires.

I hacked in code to check for a key starting with 'noname-', search for a matching entity_addr, and if found, replace the noname- map entry with one named by the incoming map entry's name, and this made the assertion go away...but I'm not at all sure it's the right sort of fix or will handle all possible initial-map or map-message cases.

Actions #1

Updated by Dan Mick over 7 years ago

  • Description updated (diff)
Actions #2

Updated by Dan Mick over 7 years ago

The hack in question:

--- a/src/mon/MonMap.cc
+++ b/src/mon/MonMap.cc
@@ -67,6 +67,16 @@ void MonMap::sanitize_mons(map<string,entity_addr_t>& o)
   for (auto p : o) {
     if (has_mon_info) {
       // make sure the info we have is accurate
+      // fix up any 'noname-' names in mon_info if we can
+      for (auto mi : mon_info) {
+        if ((mi.first.substr(0, 7).compare("noname-") == 0) &&
+            (mi.second.public_addr == p.second)) {
+          mon_info[p.first] =
+            mon_info_t(const_cast<string&>(p.first), p.second);
+          mon_info.erase(mi.first);
+          break;
+        }
+      }
       assert(mon_info.count(p.first));
       assert(mon_info[p.first].name == p.first);
       assert(mon_info[p.first].public_addr == p.second);

Actions #3

Updated by Dan Mick over 7 years ago

  • Assignee set to Joao Eduardo Luis

Spoke with Joao in IRC; he has some ideas

Actions #4

Updated by Dan Mick over 7 years ago

  • Priority changed from High to Urgent

This assert also stops new daemons from running (of course), not just the CLI. Raising to 'Urgent'.

Actions #5

Updated by Sage Weil over 7 years ago

  • Priority changed from Urgent to Immediate
Actions #6

Updated by Joao Eduardo Luis over 7 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Joao Eduardo Luis over 7 years ago

  • Status changed from In Progress to 7
Actions #8

Updated by Joao Eduardo Luis over 7 years ago

  • Status changed from 7 to Fix Under Review
Actions #9

Updated by Sage Weil over 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions #10

Updated by Dan Mick over 7 years ago

Solution chosen: just clear the local mon_info on decode of new monmap if the incoming version does not include mon_info

Actions

Also available in: Atom PDF