Bug #18265
closedMonMap::sanitize_mons() does not like monmaps built from -m/mon_hosts
0%
Description
I have a ceph.conf that I've used for some time that declares mons with
mon_initial_members = host1, host2, host3 mon host = <host1ip>, <host2ip>, <host3ip>
It seems to me that this is a valid configuration.
Lately, this has started failing with an assert:
FAILED assert(mon_info.count(p.first))
This is in sanitize_mons(), because it's trying to verify that an incoming monmap matches the initial monmap by, among other things, asserting that the map key in the incoming map is present in 'mon_info', which is the initially-built monmap.
The problem is that the initially-built map, because it was built from ip addresses, has 'noname-{a,b,c}' in the keys, so there's no match, so the assert fires.
I hacked in code to check for a key starting with 'noname-', search for a matching entity_addr, and if found, replace the noname- map entry with one named by the incoming map entry's name, and this made the assertion go away...but I'm not at all sure it's the right sort of fix or will handle all possible initial-map or map-message cases.
Updated by Dan Mick over 7 years ago
The hack in question:
--- a/src/mon/MonMap.cc +++ b/src/mon/MonMap.cc @@ -67,6 +67,16 @@ void MonMap::sanitize_mons(map<string,entity_addr_t>& o) for (auto p : o) { if (has_mon_info) { // make sure the info we have is accurate + // fix up any 'noname-' names in mon_info if we can + for (auto mi : mon_info) { + if ((mi.first.substr(0, 7).compare("noname-") == 0) && + (mi.second.public_addr == p.second)) { + mon_info[p.first] = + mon_info_t(const_cast<string&>(p.first), p.second); + mon_info.erase(mi.first); + break; + } + } assert(mon_info.count(p.first)); assert(mon_info[p.first].name == p.first); assert(mon_info[p.first].public_addr == p.second);
Updated by Dan Mick over 7 years ago
- Assignee set to Joao Eduardo Luis
Spoke with Joao in IRC; he has some ideas
Updated by Dan Mick over 7 years ago
- Priority changed from High to Urgent
This assert also stops new daemons from running (of course), not just the CLI. Raising to 'Urgent'.
Updated by Joao Eduardo Luis over 7 years ago
- Status changed from New to In Progress
Updated by Joao Eduardo Luis over 7 years ago
- Status changed from In Progress to 7
Updated by Joao Eduardo Luis over 7 years ago
- Status changed from 7 to Fix Under Review
Updated by Sage Weil over 7 years ago
- Status changed from Fix Under Review to Resolved
Updated by Dan Mick over 7 years ago
Solution chosen: just clear the local mon_info on decode of new monmap if the incoming version does not include mon_info